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Abstract — We examine the Gaussian interference channel and 
the erasure relay channel. We focus on codes that are non- 
capacity-achieving ("bad") over appropriate point-to-point (two- 
terminal) channels. Over Gaussian point-to-point channels, for 
example, such codes require greater SNR than "good" ones 
to achieve reliable communications, but often exhibit lower 
estimation errors whenever the SNR is below the Shannon limit. 
Over multi-terminal channels, this advantage of "bad" codes at 
lower SNRs can be exploited by strategies that apply estimation, 
at various network nodes, to achieve partial decoding. Such 
strategies include soft partial interference cancelation (soft-IC) 
and soft decode-and-forward (soft-DF). We develop variants of 
these two approaches, which are susceptible to rigorous analysis. 
We focus on applications of "bad" LDPC codes. We develop 
analysis tools for soft-DF, including simultaneous density evolution 
(sim-DE), and use standard density evolution to analyze soft-IC. 
We apply our analysis to the design simple-structured "bad" 
codes that outperform more complex "good" ones. 

Index Terms — Interference channel, relay channel, Gaussian 
channels, binary erasure channels 



I. Introduction 

Multi-terminal wireless networks have attracted great in- 
terest in recent years, due to the widespread success of 
applications like cellular networks, wireless LANs and sensor 
networks. Research of such networks has drawn heavily from 
existing results on point-to-point channels, of which our un- 
derstanding is much more mature. Point-to-point channels are 
characterized by just two nodes (a source and a destination), 
that wish to communicate. 

In the absence of any additional knowledge, good point- 
to-point codes would appear to be reasonable candidates for 
application to multi-terminal channels as well. An examination 
of the literature, however, reveals that many communication 
strategies for such channels rely on bad codes (see Sec- 
tions II-AI and II-BI below). In this paper, we argue that such 
codes have inherent benefits that often make them better 
candidates. 

In our analysis, we classify codes as "good" if they achieve 
the capacities of certain point-to-point channels, and "bad" 



if they do not (a rigorous definition, which involves code- 
sequences, will be provided in Sec. lII-Dt . This choice of termi- 
nology, which follows Shamai and Verdu [54], is arbitrary, and 
intentionally ignores many other attributes of the codes, like 
availability of low-complexity decoding algorithms (our use of 
quotes in "good" and "bad" reflects this fact). Throughout the 
paper, the context of our classification of codes as "good" or 
"bad" is always point-to-point channels, even when the subject 
of the discussion is multi-terminal channels. 

We begin below with a description of the problem, followed 
by our motivation for addressing it in Sec. II-BI Our main 
contributions in this paper are summarized in Sec. II-CI 

A. Background and Problem Formulation 

To illustrate some of the benefits of "bad" codes in 
multi-terminal scenarios, we begin by considering a point- 
to-point scenario. We focus on the minimum mean-square 
error (MMSE) of an estimate, computed at the destination, 
of the codeword transmitted from the source. That is, we 
assume the destination undertakes a less ambitious task than 
frequently found in information-theoretic literature: It attempts 
to estimate the transmitted codeword as best it can, rather than 
decode it completely. When reliable decoding is possible, the 
MMSE will of course be close to zero. 
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Fig. 1. MMSE as a function of SNR in a point-to-point BIAWGN channel. 
Details are provided in Appendix U 



In the discussion below, we assume that the channel is 
a binary-input additive white Gaussian noise (BIAWGN) 
channel (later, we will examine an additional model). The 
MMSE is clearly a function of the SNR over this channel. 
Fig- Q] plots the MMSE curve in three different scenarios (see 
Appendix U for rigorous details). The first curve corresponds to 



2 

a rate- 1/2 "good" codefl. Surprisingly, it doesn't matter which 
"good" code. Peleg et al. [45] have shown that the optimal 
estimation error, with any "good" code can be determined 
entirely from its rate. Their proof follows from the relation 
between the mutual information and the MMSE, as established 
by Guo et al. [23] and also observed by Measson et al. [39]. 
The second curve is an upper bound on the MMSE of a rate- 
1/2 LDPC (2,4) code, which is known to be very "bad'0. The 
third curve corresponds to transmission of a stream of uncoded 
bipolar (BPSK) bits. 

As expected, the curve corresponding to uncoded com- 
munications, upper bounds the two other curves. With such 
communications, the destination does not have the benefit of 
a code structure to draw upon in its computations. At SNRs 
above the Shannon limit (SNR > 1.044), the rate- 1/2 "good" 
code's MMSE (which is zero) outperforms the LDPC upper 
bound. This coincides with the intuition that "good" codes are 
better than "bad" ones. However, an interesting phenomenon 
occurs at low SNRs. While the "good" code's MMSE provably 
collapses to that of uncoded communications, the "bad" LDPC 
code exhibits graceful degradation and achieves a substantially 
better MMSE. 

A similar observation was made by Berrou et al. [5, 
Sec. II. B] in their choice of the components of turbo codes, 
for point-to-point communications. The "bad" codes they used, 
however, were combined and manipulated (by parallel concate- 
nation) to produce other, relatively "good" codes. When "bad" 
codes are used unaltered, their above-mentioned advantage is 
generally meaningless in point-to-point communications. In 
such scenarios, we are typically interested in complete, reliable 
decoding (an asymptotically zero error probability), and so all 
non-zero levels of MMSE are equally unacceptable. 

With multi-terminal communications, however, estimates 
may be perceived as a form of parf/a/-decoding, and computed 
at nodes other than the destinations of a given message, as 
byproducts of communication strategies. In this context, the 
advantage of "bad" codes may be meaningful. In this paper 
we are interested in quantifying the advantage in terms of 
achievable rates at destination nodes. 

In our analysis, we focus on two simple multi-terminal 
channel models: The interference channel, as introduced by 
Shannon [55], and the relay channel, as introduced by Van 
Der Meulen [58]. Both channels are illustrated in Fig. [2] 

An interference channel is characterized by two pairs of 
nodes, each pair consisting of a source and destination that 
wish to communicate. Unlike point-to-point channels, each 
destination experiences interference resulting from the signal 
produced by the source of the other pair. In a relay channel, a 
single pair of source and destination nodes wish to communi- 
cate, but are aided by a relay node which lends its resources 
to support their communications. These channels capture two 
of the fundamental phenomena that characterize wireless net- 

'To simplify our discussion in this section, we neglect some rigorous details. 
A precise description of the curve is provided in Appendix U and involves the 
asymptotic normalized MMSE of a "good" code-sequence. 

2 This follows from the analysis of [9], [53]. 

3 One exception is joint source-channel coding, where "bad" codes were 
considered [28]. 
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Source 1 Destination 1 

Source 2 Destination 2 

(a) An interference channel. 



Relay 




Source Destination 
(b) A relay channel. 
Fig. 2. Two multi-terminal channel models. 

works: Interference between nodes (e.g. resulting from the 
shared wireless medium) and the potential of cooperation to 
achieve better performance. The capacities of both channels 
are in general still unknown. 

In this paper, we focus on two classes of communication 
strategies. Soft interference cancelation (soft-IC) for inter- 
ference channels, and soft decode-and-forward (soft-DF) for 
relay channels. With soft-IC, each of the destinations in an 
interference channel computes a soft estimate of the interfering 
codeword as a byproduct of its decoding algorithm. With 
soft-DF, the relay uses its observed signal to compute a soft 
estimate of the signal transmitted by the source. Thus, both 
strategies use estimation as a method for partial decoding. 

While an overwhelming body of research exists on soft-IC, 
the majority of it (e.g. [61], [6], [10], [29], [52]) focuses on 
the approach's application as a component of larger, iterative 
schemes designed to achieve complete (rather than partial) 
decoding of multiple signals. The concepts of soft-IC for the 
purpose of partial decoding of interference, as considered in 
this paper, were proposed by Divsalar et al. [15] as well 
as [34], [62], [24]. Soft-DF was proposed by Sneessens and 
Vandendorpe [56], and related concepts were also examined 
in [60], [65], [41], [350 

Partial decoding, as applied by soft-IC and soft-DF, is a 
useful compromise in cases where complete decoding would 
be desirable if possible, but is not required by the terms 
of the problem. In an interference channel, decoding of the 
interfering signal at each destination, would enable the node 
to eliminate its interference. In a relay channel, decoding of 
the signal transmitted by the source, would enable the relay 
to better assist the source in the delivery of the associated 
message to the destination. Both signals, however, are not 
required at the respective nodes, and may be discarded once 
communication is over. Insisting on their complete decoding 
often imposes a burden on the communication strategy, which 
may outweigh the signals' usefulness. Partial decoding is a 

4 Similar concepts were also examined by Gomadam and Jafar [22]. In their 
work, however, estimation is performed individually on each received symbol, 
and dependence between symbols implied by the code is not exploited. 
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means of balancing these two considerations. 

Our interest in partial decoding also follows from its cen- 
tral role in other communication strategies, beside soft-IC 
and soft-DF. The best-known rates for the interference and 
relay channels are achieved by Han-Kobayashi (HK) [25] and 
partial decode-and-forwarc0 (partial-DF) [12, Theorem 7], [31] 
respectively. Both strategies, however, achieve partial decoding 
by rate -splitting, rather than estimation as discussed above. 
Rate-splitting involves using codes that were each constructed 
by combining two (generally, two or more) other, auxiliary 
codes. With HK, for example, this gives each destination 
the option of decoding one of the two auxiliary codewords 
that constitute the interfering signal (in addition to the two 
that constitute the signal from its own source), amounting 
to a partial decoding of the interference. From a practical 
perspective, however, soft-IC and soft-DF, enjoy a number of 
advantages over HK and partial-DF which will be discussed 
in Sec. II-BI below. 

Interestingly, many implementations of soft-DF (e.g. [56]) 
involve low constraint length convolutional codes (constraint- 
4 in [56]), which as discussed in Sec. II-BI below, are point- 
to-point "bad". In [56], [60], [41], [35] and [15], [34], [62], 
[24], efficient practical implementations of soft-DF and soft- 
IC were proposed, and extensive simulations were reported 
that confirm the methods' effectiveness. 

A rigorous analysis of these strategies, however, is often dif- 
ficult to achiev^l (see Sec. ll-Cl for a discussion of some of the 
difficulties). In this paper, we nonetheless develop rigorously- 
proven bounds on the performance of soft-DF and soft-IC, 
in terms of achievable communication rates at asymptotically 
large block lengths. Like [56] we focus on applications of soft- 
DF (as well as soft-IC) that involve "bad" codes. Furthermore, 
we place a specific emphasis on a comparison with "good" 
codes. 

B. Motivation 

Our interest in this problem was motivated in part by the 
potential for achieving a better tradeoff between achievable 
rates and decoding complexity, in comparison to point-to-point 
scenarios. 

With many classes of codes, "goodness" of point-to-point 
performance and decoding complexity, are both related to 
the complexity of the codes' structures. The constraint length 
of a convolutional code [59], for example, is a measure of 
the complexity of its structure. Codes with higher constraint 
lengths have complex trellis diagrams, and thus more com- 
plex structures. Similarly, the density of an LDPC code's 
parity check matrix [48], as measured by the average weight 
(number of ones) in each row, is arguably a measure of the 
complexity of its structure (higher density meaning greater 
complexity). Over point-to-point channels, theoretical results 
for both classes of codes, imply that the complexities of 

5 In [31] it is known as multipath decode-and-forward. 

s This term was coined by Rimoldi and Urbanke [50], in the context of 
coding for multiple-access channels. 

7 Note that in [65], simulation results were augmented by an EXIT chart 
analysis. EXIT charts, however, rely on heuristic assumptions and do not 
constitute rigorous analysis. 



their structures must approach infinity as their code rates ap- 
proach capacity, for reliable communication to be possible [20, 
Theorem 3. 3], [9], [53], [59, Sec. 5.4]. However, the decoding 
complexities of both LDPC and convolutional codes (via the 
belief-propagation and Viterbi algorithms, respectively) grow 
unboundedly with the complexities of their structure^ (as well 
as with their block lengths) [20] [19]. 

Thus, from the perspective of decoding complexity, simple- 
structured codes are advantageous. Formally, we define a 
sequence of codes to be i/m/?Ze-structured if the complexities 
of its codes are bounded, and compZejc-structured if they are 
not. By our above discussion, simple-structured codes are 
bounded away from the capacities of point-to-point channels, 
i.e. they are "bad". 

Over multi-terminal channels, however, our discussion in 
Sec. ll-Al implies that simple-structured "bad" codes sometimes 
exhibit advantages in terms of partial-decoding at various 
network nodes, which could perhaps be used to compensate for 
their weaknesses. In our analysis of soft-IC and soft-DF in this 
paper, we provide examples of simple-structured LDPC codes 
whose performance, in terms of achievable communication 
rates, provably surpasses complex-structured "good" codes. 

Our results raise the possibility that simple-structured codes 
fare better over multi-terminal channels, in terms of the gap 
from capacity, than they do over point-to-point channels. If 
so, their simple structures may be applied to improve the 
tradeoff between achievable rates and decoding complexity in 
multi-terminal scenarios. In this paper, we do not resolve these 
questions. Our objective is to motivate further research into the 
problem, and into benefits of simple-structured codes. 

Note that rate-splitting, as used by HK [25] and partial- 
DF [12, Theorem 7], also frequently yields "bad" codes. 
In an example provided in Sec. IV-BI the best application 
of HK that we found, relies on provably "bad" codes. The 
applications of rate-splitting in [25] and [12, Theorem 7], 
however, involve randomly-generated auxiliary codes. Unlike 
the simple-structured "bad" codes discussed above, such codes 
are unstructured, and no low-complexity decoding algorithms 
are known for them. Thus, at least with respect to these 
applications of HK and partial-DF, soft-IC and soft-DF as 
described e.g. in [56] and [62], have an advantage. 

Soft-IC enjoys an additional advantage over HK. With HK, 
partial decoding implies that each receiver must jointly decode 
three codewords, two from its own source and one that consti- 
tutes a part of the interference. Soft-IC requires it to examine 
just two codewords; decoding one, and estimating the other 
(the interference). As the computation time of many algorithms 
for decoding and estimation (e.g. [6]) grows exponentially with 
the number of codewords jointly examined, this constitutes an 
advantag^H 

Beyond our interest in decoding complexity, our analy- 
sis may shed light on the best achievable rates (capacities) 

8 Despite the remarkable performance of many LDPC codes (e.g. [49]) at 
rates that are very close to capacity, their densities imply impossible decoding 
complexity. 

9 Note that joint decoding can be avoided by resorting to further rate- 
splitting, using concepts similar to the ones suggested by [50] . However, such 
methods are suboptimal at all but asymptotically large block lengths. 
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of multi-terminal channels. As mentioned above, the best 
rigorously-proven known results for the interference and relay 
channels, were obtained by applications of HK and partial- 
DF (respectively) that involve randomly generated codes. 
Recently, Philosof and Zamir [46], Nazer and Gastpar [42] 
and Narayanan et al. [40] demonstrated that structured codes 
sometimes exhibit advantages over random ones, in vari- 
ous communication scenarios. Our focus on sjmp/e-structured 
codes is different from theirs. However, as the capacities of 
both the interference and relay channels are yet unknown, 
an analysis which builds on our methods may be of similar 
theoretical interest. 

C. Overview of Main Results 

Our focus in this paper is thus on the performance of soft- 
DF and soft-IC, in terms of achievable communication rates, 
when used with simple-structured "bad" codes. An analysis 
of soft-DF is complicated by a number of factors. First, stan- 
dard information-theoretic techniques do not straightforwardly 
apply to non-random simple-structured codes. 

More importantly, however, analysis of soft-DF is compli- 
cated by unfavorable attributes of soft decoding. Consider 
the relay's estimation of the codeword transmitted by the 
source. As explained later, this estimate is conveyed to the 
destination, which relies on it in its decoding process. Ideally, 
we would treat the estimation error as additive white noise, and 
apply standard coding-theoretic analysis techniques. However, 
the estimation error is very different from such noise. The 
components of the error vector can generally be shown to be 
strongly correlatec0, and the correlation patterns are complex. 
The error is not independent of the transmitted codeword. 
Lastly, the error cannot be argued to be independent of the 
codebook in use, because the estimation process relies on the 
structure of the code. In this paper we develop methods for 
overcoming these difficulties. 

To enable rigorous analysis, we develop a variation of soft- 
DF that is analytically tractable. Our variation assumes the 
use of LDPC codes [20], and relies on their associated belief- 
propagation (BP) algorithm as a method of soft estimation. We 
thus refer to it as soft-DF-BP. Our analysis of this algorithm 
applies to erasure relay channels (defined in Sec. lIII-Al below). 

Our choice to focus on LDPC codes follows from the elab- 
orate design and analysis tools that exist for them (e.g. [47]). 
This choice may appear unusual, as LDPC codes are known 
primarily for their relative "goodness", i.e. the possibility of 
capacity-approaching performance over many point-to-point 
channels (e.g. [49], [36]). However, as noted in Sec. II-BI 
"bad" LDPC codes exist as well. Such codes can be designed 
by manipulating the edge distributions that determine the 
structure of their underlying Tanner graphs (see Sec. III-EI ). 

BP in the literature is typically used for complete decod- 
ing, not for soft estimation. However, the algorithm in fact 
approximates bitwise maximum a-posteriori (MAP) decisions, 
and so is actually an estimation algorithm. Over point-to-point 
channels, BP has typically been applied in scenarios where 

l0 This follows because optimal estimation is in general not achieved by 
symbol-wise computation. 



the level of noise is low enough for the proportion (fraction) 
of erroneously decoded bits to be small, essential amounting 
to complete decoding. In this paper, we will examine its 
performance in other scenarios as wel0 

Our definition of soft-DF-BP is based on compress-and- 
forward (CF) [12], [32], [26]. With CF (see Sec. ITTLAl the 
relay forwards its channel output vector to the destination. 
The destination combines this with its own channel obser- 
vation, and attempts to decode using both. With soft-DF-BP 
(see Sec. IIII-BI) . the relay first applies BP to estimate the 
transmitted codeword. While complete decoding is in general 
not possible, estimation reduces of the level of noise, thus 
improving the quality of the signal delivered to the destination. 

Our analysis of soft-DF-BP focuses on the performance of 
simultaneous BP (sim-BP), a hypothetical algorithm which 
accesses the channel outputs from both the relay and the desti- 
nation. In practice, the two nodes are physically separated, and 
so the algorithm cannot be realized. It is nonetheless designed 
so that its performance can be used to upper bound the number 
of bit errors at the output of soft-DF-BP. Furthermore, the 
structure of sim-BP enables its analysis using an extension 
of density evolution, a method devised for the analysis of 
LDPC codes over point-to-point channels [47]. We refer to 
the extension as simultaneous density evolution (sim-DE). 

As with CF, with soft-DF-BP the relay exploits its channel 
to the destination, to deliver the estimate it computed to that 
node. To reduce the demands on bandwidth (rate) to fit the 
available capacity of this channel, the delivered signal is first 
compressed and distorted (lossy compression). In our analysis, 
we demonstrate that the simple structure of "bad" LDPC codes 
can often be applied to improve the compression rate and 
reduce the distortion. 

Turning to soft-IC, its analysis is simplified by powerful 
analysis tools that were developed in the context of mul- 
tiuser detection, e.g. by Boutros and Caire [6] and Am- 
raoui et al. [1] (and references therein). We focus on a 
variation of soft-IC which essentially coincides with joint 
iterative multi-user detection (iterative-MUD) as suggested in 
these references. In our context, we refer to the algorithm as 
soft-IC-BP. This choice of terminology reflects the algorithm's 
role in our setting, as explained below. Our analysis applies 
to symmetric BIAWGN interference channels. 

Iterative-MUD is typically applied in multiple-access set- 
tings, where a destination attempts to decode (completely) a 
superposition of two (or more) signals from different users. 
Like BP, however, iterative-MUD in fact approximates bitwise 
MAP decisions, and thus straightforwardly applies to estima- 
tion as well. With soft-IC-BP, each of the two destinations 
applies the algorithm to attempt to decode the codewords from 
both sources. Unlike multiple-access settings, with soft-IC-BP 
we relax the requirement of complete decoding of the inter- 
fering codeword, and tolerate a large error in its estimation. 
Analysis tools of iterative-MUD carry over straightforwardly 
to the analysis soft-IC-BP. 

As benchmarks for comparison, we consider strategies that 

"A similar approach was taken by Barak et al. [3], in the context of 
communication over erasure channels with unknown erasure probabilities. 
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rely on "good" codes. Over symmetric BIAWGN interference 
channels, we examine single user detection (SUD) and mul- 
tiuser detection (MUD f3 (see Sec. IIV-AI ). Furthermore, we 
develop tight bounds on the achievable rates of communication 
with any set of "good" codes, and prove that they cannot 
exceed the performance of SUD and MUD. Over erasure 
relay channels, we compare our performance with the above- 
mentioned CF, as well as with decode-and-forward (DF) (see 
Sec. IIII-Ab . We also provide bounds on applications of soft- 
DF-BP that use "good" codes, which are valid under certain 
plausible assumptions. 

To demonstrate the effectiveness of our constructions, we 
design specific applications of soft-DF-BP and soft-IC-BP, 
that provably outperform the above benchmarks. As with 
communication over point-to-point channels, the identities of 
the LDPC codes in use play a crucial role in the performance 
of soft-DF-BP and soft-IC-BP. To design effective codes, we 
extend a technique proposed by Richardson et al. [49] from 
point-to-point channels to our settings. 

As noted in Sec. II-B1 our objective in this paper is to 
demonstrate the potential of simple-structured codes, for which 
we believe low-complexity algorithms exist. Design of such 
algorithms, however, is beyond the scope of this work. Specifi- 
cally, with soft-DF-BP, our discussion leaves out the details of 
compression at the relay and communication of the estimate 
to the destination. We assume that these components of the 
strategy are achieved in the same way as CF, which applies 
high-complexity computations. 

This paper is organized as follows. In Sec. [TT] we intro- 
duce some preliminary notations and definitions. We formally 
define "good" and "bad" codes, and provide some relevant 
background on LDPC codes. In Sec. [Til] we define soft-DF- 
BP, develop its analysis tools, and derive our bounds on the 
performance of "good" codes. In Sec.|lV]we do the same with 
soft-IC-BP. In Sec. [V] we present specific applications of soft- 
DF-BP and soft-IC-BP. Finally, Sec. [VT] concludes the paper. 
Throughout the paper, proofs and various details are deferred 
to the appendix. 

II. Preliminaries 

A. General Notation 

Vector values are denoted by boldface (e.g. x) and scalars 
by normalface (e.g. x). Random variables are upper-cased (X) 
and their instantiations lower-cased {x). E denoted expecta- 
tion. exp(cc) denotes the exponential function, e x (we will 
use both notations interchangeably). In denotes the natural 
logarithm (to the base e) and log denotes the base 2 logarithm. 
Correspondingly, all communication rates are given in bits per 
channel use. [a, b] denotes the interval {x £ R : a < x < b}, 
and (a, b) denotes {x £ R : a < x < b). 

Given a node i in a graph, Af(i) is the set of nodes that 
are adjacent to i. Given a vector x = {x\ 1 ...,x n ), we let the 
vector x^i denote the vector obtained from x by omitting Xi, 
that is, 

Xr^ ? ; (xi,...,X^_i,Xj-)-i,...,X n ) (1) 

12 While HK often also involves "good" codes, they are combined by rate- 
splitting to produce other, typically "bad", codes. 
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h(x) denotes the binary entropy function, 

h(x) = —x ■ log x — (1 — x) log(l — a;) 

n will typically denote the block length of a code, whose 
identity will be clear from the context. o(l) is a term that 
approaches zero with n. 

B. Binary Input AWGN and Binary Erasure Channels 

We now define two point to point channels which we will 
use throughout the paper to classify codes as "good" or "bad". 
The binary additive white Gaussian noise (BIAWGN) channel 
is characterized by the equation, 

Y = X + Z (2) 

where Y is the channel output, X (the transmitted signal) is 
taken from {±1}, and Z is a zero-mean real-valued Gaussian 
random variable with variance a 2 , whose realizations at differ- 
ent time instances are statistically independent, a is a positive 
constant. 

The binary erasure channel (BEC) is characterized by, 

y _ f e, with probability 5 
\ X, otherwise. 

where Y is the channel output. X is the channel input, and is 
taken from {0, 1}. S £ [0, 1] is a constant, e is a symbol 
indicating an "erasure" event. We assume that the channel 
transitions at different time instances are independent. We let 
BEC(<5) denote a BEC with crossover probability S. 

We define the Shannon limit for the BIAWGNC and BEC in 
the usual way, as the inverse of the Shannon capacity function: 

Definition 1: Let R £ [0, 1]. The BIAWGN (resp. BEC) 
Shannon limit for rate R is the minimal (resp. maximal) 
value SNR* (resp. erasure probability 5*) such that reliable 
communication is possible at rate R. 

C. Notations for Analysis of Erasures 

The following notations will be useful in our analysis of 
erasure channels. For simplicity, we rewrite (01 as, 

Y = X + E (4) 

where E is an erasure noise random variable, denoted E ~ 
Erasure(i5) and equal to e with probability 5 and to other- 
wise, and addition of two values X\,X2 £ {0, 1, e} is defined 

as, 

a f e, x\ = e or x^ = e 

x\ + x 2 = < _ fU ■ 

I a;i®a;2, otherwise. 

where © denotes modulo-2 addition. Note that the sum of 
two independent erasure noise variables Ei ~ Erasure(<5i) 
and F12 ~ Erasure^) is also an erasure noise, distributed as 
Erasure(<5i o <5 2 ) where, 

<W 2 = S 1 +S 3 -(l-5 1 ) (5) 
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We also define the multiplication of two values Xi,X2 G 
{0, 1, e} as follows, 



Xl ■ X2 



X\ ■ X2, Xi 7^ e and 12 ^ e 

x 2 , x\= e (6) 

xi, x 2 = e 



Note that although the product xi ■ x 2 is defined also for 
cases that xi and x 2 are not erasures and xi ^ x 2 , we will 
not encounter such cases in practice. We define the product 
between two vectors xi, x 2 G {0, 1, e}™ as the vector obtained 
by multiplying the respective components. 

Finally, we introduce two more definitions, 

Definition 2: Let x G {0,1, e}". The erasure rate of x, 
denoted P e (x), is the fraction of its components that are equal 
to an erasure. 

Definition 3: 

1) Let x,y G {0,1, e}. We say that y is degraded with 
respect to x if the following two conditions do not hold 
simultaneously: x — e and y ^ e. 

2) If x G {0,1, e}™ and y G {0,1, e}", we say that y is 
degraded with respect to x if for all i = 1 , . . . , n, yi is 
degraded with respect to Xi. Equivalently, y is degraded 
with respect to x if the set of indices of y that are 
erasures, contains the equivalent set for x. 

D. "Good" Codes 

Our definition of "good" codes is a variation of the def- 
inition of Shamai and Verdu [54]. For simplicity, we have 
specialized it for BIAWGNCs and BECs, but it can straight- 
forwardly be generalized to other classes of channels as well. 
The definition focuses on sequences of codes C = {C n }^ =1 . 
Given such a sequence, we define its rate as the limit of the 
rates of the individual codes, if the limit exists. 

Definition 4: Let C — {C n }^L 1 be a sequence of codes of 
rate R. 

1) We say that C is "good" for the BIAWGN channel if the 
following holds: 

lim P e (C„; 5) = 0, VSNR > SNR* 

n— f 00 

where SNR* is the Shannon limit for the BIAWGN at 
rate R and P e (C n ; SNR) is the probability of error under 
maximum-likelihood (ML) decoding, when the code C n 
is used over a BIAWGN with the specified SNR. 

2) We say that C is "good" for the BEC if the following 
holds: 



lim P e (e n ;S)=0, 



V<5 < 5* 



where 6* = 1 — R is the Shannon limit for the BEC of 
rate R and P e (C n ;8) is the probability of error under 
ML decoding, when the code C n is used over a BEC 
with an erasure probability of 5. 

Remark 1: For simplicity of notation, we adopt the conven- 
tion that the block length of C n is n. 

We refer to a code-sequence as "bad" for a particular 
class of channels if it is not "good". We will occasion- 
ally use the terms "good" or "bad" without mentioning the 



class of channel, whenever the class will be clear from the 
context. Specifically, when discussing erasure relay channels, 
"goodness" will be assumed to relate to the BEC, and when 
discussing BIAWGN interference channels, "goodness" will 
relate to the BIAWGN. 

The existence of "good" code-sequences is guaranteed 
by the achievability proof of channel capacity (e.g. [13]£H 
Furthermore, BIAWGN and BEC channels admit "good" se- 
quences of linear code0 

E. LDPC Codes 

LDPC codes play a central role in our analysis. A com- 
prehensive review of these codes is available e.g. [48]. For 
completeness, we now describe some of their essential fea- 
tures, which we will use in our analysis. 

An LDPC code is characterized by a bipartite Tanner 
graph [57], as in Fig. [3] The nodes on its left side are called 
variable nodes, and each corresponds to a transmitted codebit. 
The nodes on the right are check nodes, and each corresponds 
to a parity-check. The codewords of the LDPC code are 
defined by the condition that at each check node, the set of 
codebits corresponding to adjacent variable nodes, must sum 
to zero (modulo-2). 



Variable 
nodes 




Check 
nodes 



Fig. 3. An example of the Tanner graph of an LDPC code. 

The performance of an LDPC code is determined by 
the structure of its Tanner graph. Luby el al. [37] sug- 
gested graphs characterized by two probability vectors, A = 
(Ai, . . . , A c ) and p = (pi, . . . , pd), which are known as edge 
distributions. In a (A, p) Tanner graph, for each i = 1, c a 
fraction Ai of the edges has left degree i, meaning that they 
are connected to a variable node of degree i. Similarly, for 
each j = l,...,d a fraction pj of the edges has right degree 
j, meaning that they are connected to check node of degree 
j. While Ai refers to the fraction of edges, the fraction of 
variable nodes of degree i is in general different, and can be 
shown to equal, 



Ai = 



Xi/i 



(7) 



13 To make this statement precise by our above definition of "good" 
codes, we invoke the well-known fact that the capacity-achieving distribution 
for all BIAWGN and erasure channels is the same (Bernoulli(l/2)) [21, 
Theorem 4.5.1]. 

14 This was proven by Elias [16] for binary symmetric channels and later 
extended to other binary-input symmetric-output channels (see e.g. [59]). 
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The fraction fjj of check nodes of degree j can similarly be 
obtained from p. 

A Tanner graph is said to be (c, d)-regular if all the variable 
nodes have degree c and all the check nodes have degree d. 
Equivalently, the graph is characterized by a pair (A, p) where 
A c = 1 and pd — 1. A code is said to be (A, d)-right-regular if 
it is characterized by (A, p) where pd — 1, i.e. all check-nodes 
have degree d. 

The LDPC (A, p) ensemble is the set of codes whose Tanner 
graphs are characterized by (A, p). As often encountered in in- 
formation theory, analysis of LDPC codes is greatly simplified 
by focusing on the average performance of a code selected 
at random from such an ensemble, rather than on the per- 
formance of an individual code. Luby et al. [37, Sec. III. A] 
suggested a procedure for randomly generating a Tanner graph 
that is characterized by a given (A,/?). Different pairs (\,p) 
may correspond to substantially different performance, and 
so much of the analysis of LDPC codes focuses on finding 
effective pairs. 

The rate of a (A, p) LDPC code can be shown to be lower 
bounded by the following value, known as the design rate. 



design 



i 



(8) 



Central to the success of LDPC code has been their effi- 
cient belief-propagation (BP) decoding algorithm. A general 
description of the algorithm is available e.g. [8, Algorithm 
2]. In the special case of transmission over the BEC, the 
algorithm has a simple equivalent formulation [37], provided 
below. The input to the algorithm is the channel output vector 
y = [yi,...,y n ] and the output is a vector y BP of decisions 
(estimates) for the various bits. In the description below we 
make use of notation which was introduced in Sec. IH-CI 
Algorithm 1 (Belief-propagation (BP) over the BEC): 
1) Iterations. Perform the following steps, alternately, a 
pre-determined t times. 
• Rightbound iteration number £ = 0, t — 1. At all 
edges compute the rightbound messages r, 

as follows, 



ij 



1 = 0, 



r V = i ....rr iW - " (9) 



where 1-,^ is a leftbound message computed in the 
preceding leftbound iteration. 
• Leftbound iteration number i = 1, t. At all edges 
(j, i) compute leftbound messages as follows, 

i'eJV r (j)\{<} 

2) Final decisions. For each i = 1, ...,n compute, 

Note that the right hand side of ( fTTT i may potentially be 
an erasure. In some formulations of the BP algorithm, yf 
is randomly set to or 1 whenever this happens. In this 



paper, however, we allow to remain an erasure. As noted 
in Sec. II-C1 our analysis will include cases where many of 
the components of y BP remain erasures, amounting to an 
incomplete decoding of the transmitted codeword. 

III. Coding for the Erasure Relay Channel 
A. Channel Model and Achievable Strategies 

Fig. H] depicts our model for the binary erasure relay chan- 
nel. It is a variation of the model suggested by Kramer [30] 
and is a special case of the models of [27], [12]. The channel 
is characterized by a triplet (62, 83, C ), explained below. 

Like [27], we assume that the channels to the destination 
from the source and from the relay are decoupled. That is, 
the destination receives two independent channel observations: 
Y3, which is a function of the source signal X\ and Y 3 ' 
which is a function of the relay signal AT2. As we will see, 
this assumption simplifies the analysis, while retaining the 
essential challenges facing the design of relay communication 
strategies. Following [27], we characterize the channel from 
the relay to the destination by its capacity C alone, and 
refrain from specifying the channel transition probabilities. 
The precise probabilities will be inconsequential, and our 
strategies will apply equally regardless of them (as [27]). 

We assume that the source signal X\ is received by the 
relay and the destination via independent memoryless BECs 
with erasure probabilities 82 and S3, respectively. Using the 
notation of Sec. III-C1 this means that Y2 and Y3 are related to 
X\ via, 



Y 2 
Y 3 



X! 



E2 

E 3 



(12) 



where E2 and E 3 are independent erasure noise variables, 
distributed as Erasure^) and Erasure^), respectively. 

Relay 




Source 



Fig. 4. The binary erasure relay channel. 



Destination 



Following [12], we assume that the relay is full-duplex, 
meaning that it can listen and transmit simultaneously. We 
define communication strategies and achievable rates in the 
standard way, see [12]. Specifically, the signal transmitted by 
the relay at time i may depend only on the channel outputs it 
observed at times j = 1, i — 1. 

Two relay communication strategies that frequently appear 
in the literature are decode-and-forward (DF), and compress- 
and-forward (CF) (see e.g. [32], [26], [12] as well as the 
tutorial [31]). With DF, the relay decodes the codeword from 
the source, and then cooperates with that node in the delivery 
of the associated message. CF focuses on scenarios where 
decoding of the codeword transmitted from the source is not 
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possible at the relay. An overview of this strategy was provided 
in Sec. |FC] 

The achievable rates with DF and CF can be obtained using 
expressions in [270 With DF, any rate R given by, 

i?<min (i(x v ,y 2 ), /(x i; y 3 ) + c ) 

is achievable, where the distribution Px 1 {x\) of Xi is a 
parameter that can be optimized, and the distributions of the 
rest of the variables are derived from the channel transitions. 
The maximum achievable rate _Rdf can be shown to equal, 

Rdf = min (l - S 2 , 1 - 6 3 + C ") (13) 

Turning to CF, by [27] the following rates are achievable, 

R<I{Xy, Y 2 ,Y 3 ) (14) 

Y 2 is an auxiliary random variable which is dependent on 
the relay output Y 2 . The distributions Px 1 {xi) of X\ and 
Pp 3 iy- 2 ($2| J/2) of Y 2 (conditioned on Y 2 ) are parameters which 
can be optimized. Their selection is constrained by the follow- 
ing condition, which needs to be satisfied: 

I{Y 2 ;Y 2 \Y 3 )<C (15) 

Evaluation of the optimal choices for Pxi(^i) and 
iy- 2 (j/2 1 J/2) is beyond the scope of our work. In this paper, 
we confine ourselves to X\ which is uniformly distributed in 
{0, 1} and Y 2 which is distributed as 

%=Y 2 +E 2 (16) 

where E 2 ~ Erasure^) and is independent of Y 2 . E 2 
corresponds to distortion, as applied by CF at the relay (see 
Sec. ITCl 

Our choice for the distribution of E 2 was guided by ease of 
analysis^ We will make similar choices later, in our design 
of methods based on soft-DF (Sec. IIII-CI below), and so the 
comparison will be fair. With these choices, (fl4l and (Q3) 
become, 

R < 1 - (5 2 o 62) ■ S 3 (17) 
h(S 2 o 62) + (1 - S 2 o 62) • 6 3 - h{8 2 ) ■ (1 - S 2 ) < C 

(18) 

where the operation o is defined by (0. In the context of our 
discussion of distortion in Sec. lI-CI 5 2 is the level of distortion 
in the signal conveyed by the relay to the destination. We 
define Rqf to equal R, as given by the right hand side of ( fTTI i, 
evaluated at the minimal S 2 which satisfies ( TT8l (note that 
S 2 = 1 renders the left-hand-side zero, and so the minimal S 2 
is well-defined). Explicitly, 

i?CF= max {l-(5 2 o5 2 )-5 3 \ (19) 

8 2 satisfies {HQ > 

The strategies described in [12], to achieve Rqy and Rqf, 
involve randomly generated codes (according to a uniform 

15 The analysis of [27] specializes the results of [12] to settings where the 
outputs at the destination are decoupled, as in our formulation. 

16 Similar motivation guided the choice of auxiliary variables in [32, 
Sec. VILA], in the context of CF over the Gaussian relay channel. 



distribution iniO, 1}), which are "good" for the point-to-point 
BEC channeO In Sec. IV-Al we will use them as benchmarks, 
and provide examples of applications of soft-DF-BP that rely 
on "bad" codes, and outperform both these rates. In Sec. IH-DI 
we will further discuss related bounds on the performance of 
"good" codes. 

B. Definition of Soft-DF-BP 

As noted in Sec. [PC] soft-DF-BP is based on CF. With CF, 
the relay forwards its channel observation un-decoded to the 
destination. With soft-DF-BP, it first attempts to estimate the 
transmitted codeword, and forwards the resulting estimate to 
the destination. The destination combines this estimate with its 
own channel observation, and attempts to decode the source's 
codeword using both. 

As also noted in Sec. II-CI the relay communicates its 
estimate to the destination using its channel to that node. 
To fit the capacity of this channel, the estimate is first com- 
pressed, to reduce the required bandwidth. Often, compression 
is not enough, and the signal needs to be distorted (lossy 
compression) to further reduce its entropy. Some reduction 
in the necessary rate can also be achieved without distortion, 
using a variant of Wyner-Ziv coding [64]. With this approach, 
the destination exploits the signal it obtained via its channel 
from the source, as side-information when reconstructing the 
relay's estimate (from the signal communicated by that node). 
Specifically, it relies on the statistical dependencies between 
the two signals. 

A detailed discussion of compression at the relay and 
reconstruction at the destination, is provided by Cover and 
El Gamal [12, Theorem 6] in the context of communication 
using CF (the strategy's name having been coined later [32]). 
In this paper, we apply their results to communication using 
soft-DF-BP (see Sec. ITTFCl and Appendix HI] below). 

The details of soft-DF-BP are as follows. 

Algorithm 2 (Soft-DF-BP): 

• Source. Select a codeword xi from an LDPC code C 
(which will be specified later), and transmit it over the 
channel. 

. Relay. 

1) Soft decoding. Apply BP (Algorithm QJ to compute 
an estimate of Xi from the channel output y 2 (see 
Sec. II-CI for a discussion of the application of BP 
to estimation). The estimate is denoted ylf. 

2) Wyner-Ziv compression. Apply a vector quantizer to 
map ySf to a distorted version y| p , and communicate 
it to the destination (this will be elaborated in 

Sec. inns. 

. Destination. 

1) Reconstruction. Reconstruct yjf from the signal 
transmitted by the relay, using the output y3 of the 
channel from the source as side-information. 

2) Decoding. Apply BP to the vector yJf • y3 (i.e., 
use this vector instead of y in Algorithm [TJ, where 

17 More precisely, a sequence of codes generated in this way is "good" with 
probability 1, the probability implied by their random generation. 
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multiplication is defined as in Sec. III-CI The output 
of the algorithm is denoted yf. 

Remark 2: Recall from our definition of BP (Sec. lII-El that 
yf, the output of BP at the relay, is defined over the alphabet 

{0,1, e}. 

Ultimately, our measure of performance is the erasure rate (see 
Definition|2]i at the output yf of soft-DF-BP at the destination. 

C. Analysis Framework 

Our analysis of soft-DF-BP will rely on a simple as- 
sumption, which involves the statistical relation between the 
relay estimate Yf and its distorted version Yf (the random 
variables corresponding to yf and yf as defined above). We 
begin by describing our assumption, and follow by a theorem 
which justifies its use. Formally, the justification will apply to 
a slightly modified version of soft-DF-BP (in comparison to 
Algorithm |2]). We conjecture that in practice the modification 
is not needed, and our analysis results apply to the original 
version as well. 

We model the statistical relation between the components 
of Yf and Yf in the following way, which parallels ( [T6l i of 
our discussion of CF. 

Y%=Y% + E 2ti (20) 

where E2,i are i.i.d erasure noise components, ~ 
Erasure ((52). We refer to the vector E2 as the quantization 
noise vector. 82 is the quantization noise level, and will be 
discussed shortly. It is convenient to view (|20l as a stochastic 
channel between F 2 BP and Y£\. We formulate this in the 
following definition. 

Definition 5: In the stochastic channel setup for analysis 
of soft-DF-BP, Wyner-Ziv compression at the relay, and re- 
construction at the destination, are replaced by transmission 
over a virtual BEC with erasure probability 82- We define 
the (82,83,82) stochastic relay channel as the channel that 
includes the physical links from the source to the relay and 
the destination (which are BECs with erasure probabilities 82 
and S3, respectively), and the above virtual BEC. 

We will occasionally refer to the {82,83,00) channel of 
Sec. IIII-AI as the physical relay channel, to distinguish it from 
the above stochastic one. 

In our analysis, we will be interested in the erasure rate 
(Definition I3 at the output of the destination's BP decoder, in 
the above stochastic channel setup. 

The following theorem provides the formal justification for 
our analysis. 

Theorem 1: Let (82, 83, C ) be the parameters of an erasure 
relay channel as defined in Sec. IIII-AI Let {C^^Li be a 
sequence of LDPC codes with rate R, where n is the block 
length of C n , and let 82, e <E [0,1]- Assume the following 
conditions hold. 

1) The erasure rate at the output of soft-DF-BP when 
applied in the (82,83,82) stochastic channel setup, is 
upper bounded by e, with a probability that approaches 1 
with n. 



2) The following inequality is satisfied for large enough n, 

^•/(Yf;Yf |Y 3 )<C (21) 

Then a rate of R ■ (1 — h(e/R)) is achievable using a modified 
version of soft-DF-BP, over the physical (82, 83, C ) channel. 

The proof of this theorem relies on the analysis of CF [12, 
Theorem 6] and is provided in Appendix [TTJ The modified 
version of soft-DF-BP introduces an extra outer cod^H which 
is concatenated with the LDPC code C, and replaces BP 
decoding at the destination with joint-typicality decoding. It 
preserves the main features of Algorithm |2] specifically soft 
decoding by BP at the relay, and Wyner-Ziv compression. 
Typically, we will be interested in negligibly small values of 
e (e.g. 10~ 6 ), and so the term 1 — h(e/R) will be very close 
to 1. 

Recall that in our description of soft-DF-BP (Algorithmic]) 
we left out the details of Wyner-Ziv compression at the relay 
and reconstruction at the destination. In the modified version 
of the algorithm (TheoremHJ, we assume the implementations 
described in the proof of [12, Theorem 6]. 

Equation (fSTJ in the second condition in Theorem Q] par- 
allels <n~5b from the analysis of CF. The distributions of the 
various random variables on the left hand side of d2"T1 > are 
obtained from the following discussion: Yf is related to Yf 
via d20l ). Yf is a deterministic function of the random channel 
output at the relay Y2, being the output of an application of 
BP to this vector. Y2 and Y3 are both obtained from the 
transmitted codeword Xi via the channels from the source to 
the relay and destination, respectively. Finally, Xi is uniformly 
distributed within the LDPC code C n . 

By our above discussion, given a (82,83,00) erasure relay 
channel, an application of soft-DF-BP to the channel involves 
specifying not only the edge distributions (A, p) for a sequence 
of LDPC codes (as in point-to-point communication) but 
also the quantization noise level 82 ■ In our analysis, we will 
typically begin with a pair (A, p), and select the quantization 
noise level by minimizing 82 subject to PIT ). 

Our analysis now focuses on evaluating the two conditions 
of Theorem Q] The first condition will be discussed in Sec- 
tions |1ILD] and |IILE| and the second in Sec. lIlLFl 

D. Background: Density Evolution 

Our analysis of the bit erasure rate at the output of soft-DF- 
BP will rely on an extension of density evolution [47]. In this 
section we review density evolution as applied to the analysis 
of LDPC codes over point-to-point BECs, and examine the 
difficulty in extending it to soft-DF-BP over erasure relay 
channels. Overcoming this difficulty will be our focus in 
Sec. IIII-EI A complete and rigorous discussion of density 
evolution is available e.g. [36], [47], [48], and in this section 
we restrict our discussion to its essentials. 

Density evolution is a numerical algorithm for approximat- 
ing the bit erasure rate (Definition [2]i at the output of BP 
(Algorithm [T] Sec. III-Et . Its approximation is asymptotically 

l8 A similar technique was applied in various contexts, e.g. [17, Theorem 3] 
implicitly relies on a similar derivation. 
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precise, in the sense that the realized erasure rate can be 
proven to approach it in probability, exponentially with the the 
LDPC block length n. Specifically, a concentration theorem 
relates the realized erasure rate to the probability that an 
individual message is an erasure. Density evolution computes 
an asymptotically-precise estimate of this probability. 

Consider a rightbound message rjj' at iteration £ of BP 
(see Algorithm Q]). This message is a function of leftbound 
messages that were computed in the preceding iteration, which 
in turn are functions of other messages. It is useful to depict 
this recursive structure in a computation graph, as shown in 
Fig. [5] The variable node which produced is drawn at the 
bottom of the graph. The check nodes j' e A/"(z)\{j} whose 

!£\ (£\ 

leftbound messages 1^ were used in the computation of 
(see (O) are drawn directly above node i. For each such check 
node j', the variable nodes whose rightbound messages were 
used in the computation of l*.f] are drawn above j', and so 
forth. 




Fig. 5. The computation graph of a message r>/. 



corresponding to r- ) is independent of the edge We 

(£) 

denote this distribution by . A similar argument can be 
made of the leftbound messages {1$} (see (0) on which 
the value of PQ ■ relies, and we denote their distributions by 

L ■ 

Each leftbound message is a function of the channel 
transitions corresponding to the variable nodes in the sub- 
graph spanning upward (in Fig. [5]) from the check node that 
produced it. As the computation graph contains no loops, sub- 
graphs corresponding to different messages do not intersect, 
and their nodes are distinct. Since the channel transitions 
are independent (by the memorylessness of the channel), 
this implies that the variables {L^-f]} (the random variables 

corresponding to {l^f]}) are also statistically independent. 
Furthermore, they are independent of the channel output Yj,, 
whose value also affects in (JU). These observations lead 
to simple equations [49, Sec. III. A], which can be used to 

(£) (£) 

compute ("evolve") P„ from P L and Py (the distribution 

(£\ 

of Yi). Similar arguments can be applied to compute Pi from 

(t— 1) 

Pft and so forth. These recursive equations are the basis 
for density evolution. 

Unfortunately, these equations do not apply straightfor- 
wardly to the analysis of BP, as used by soft-DF-BP at the 
destination (they do apply to its analysis at the relay). The 
input to BP at the destination is y!f • y3 (see Algorithm [2j. 
Unlike the output of a memoryless BEC, the erasures in Yf 
are in general not statistically independent. In the context of 
our above discussion, this means that the leftbound messages 
{Ly ] } are now functions of dependent random variables, and 
are thus no longer independent. 



We now make some simplifying assumptions. First, in this 
section only, we confine our attention to regular LDPC codes 
(see Sec. III-Eb . In the following sections, we will allow 
irregular codes as well. The extension to such codes is possible 
using the concepts of [37], [47] and will not be elaborated. 
Second, we condition our analysis on the event that the all- 
zero codeword was transmitted by the source. Our interest in 
(£) 

r>- is confined to the question of whether or not it equals 
an erasure. An examination of BP (Algorithm [TJ reveals that 
this only depends on the channel transitions, and not on the 
transmitted codeword, and thus our analysis will apply to other 
cases as well. Having fixed the transmitted codeword, the 
messages of BP, as well as its final output, are now functions 
of the realizations of the BEC channel transitions alone (or 
equivalently, the erasure noise components (0]i). Lastly, we 
assume that the computation graph (Fig. [5]) contains no loops. 
Assuming the code's Tanner graph was generated by the 
random method that was mentioned in Sec. III-EI this can be 
shown to hold, with high probability, at all but an exponentially 
small (in n) fraction of the messages at iteration £ [47]. 

With these assumptions, a number of observations can be 
made. First, computation graphs corresponding to different 
rightbound messages at iteration £ have identical structures. 
Since the channel transitions probabilities are also identical, 

(£) 

this implies that the distribution of FQ (the random variable 



E. Simultaneous Density Evolution 

To overcome this difficulty (as noted in Sec. ll-Ct . in this sec- 
tion we define a new algorithm which we call simultaneous-^' 
(sim-BP). Sim-BP plays a role equivalent to both applications 
of BP (at the relay and the destination) used by soft-DF-BP. 
Recall from Sec. IIII-CI that the setup of our analysis assumes a 
stochastic relay channel (Definition [5]). The input to sim-BP is 
a triplet of vectors, (y2,y3,e2), where y2 and y3 correspond 
to the relay and destination channel observations, and 62 is 
the realization of the quantization noise vector. The output of 
sim-BP is a pair of estimates of the codeword xi that was 
transmitted by the source. 

As noted in Sec. II-CI sim-BP cannot be realized, because 
the relay and destination are physically separated, and so 
combined access to both their channel observations (y2 and 
y3) is not possible. The assumption that the algorithm has 
access to the quantization noise §2 is similarly unusual. Sim- 
BP is thus intended only as a theoretical tool for analysis. 
We will prove that its output is degraded (in the sense of 
Definition [3) with respect to the of output of soft-DF-BP, and 
thus its performance can be used to bound that of soft-DF-BP. 
Most importantly, its structure will enable rigorous analysis 
using a variation of density evolution, which we will call 
simultaneous density evolution (sim-DE). 
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Recall from Sec. IIII-Dl that our difficulty in applying density 
evolution involved the vector y| p ■ y3, which at the destination 
replaces y of the definition of BP (Algorithm Q]). The compo- 
nents of the vector appear in the expression for the rightbound 
message (O. We rewrite this expression below, explicitly as it 
is used at the destination. 



1U) 



Via ■ V3,i, 



0, 



(22) 



where the superscript (3, £) denotes messages at the destination 
(node 3) at BP iteration £. By d20l i. the components of y 2 p are 
modeled by, 



(23) 



To overcome the difficulties involving y| p • ys, with sim-BP 
we replace d22l with, 



1U) 



where, 



r ij ■ V3,i 
An ■ 2/3,* 



Hj'GjV(i)\{j} Vt ' 



t = o, 

£>0. 



J2,l) 

r . . 

w 



e 2, 



(24) 



(25) 



(2 £) 

where • ' is the rightbound message computed by the relay's 
BP decoder and e 2i is the same realization of the erasure noise 
as in d23l (which, as mentioned above, is included in the input 
to the algorithm). 

More precisely, sim-BP operates as follows. It is a message- 
passing algorithm, and alternates between leftbound and right- 
bound iterations. Its messages are pairs, denoted (ry^,ry^) 

and (l^i'^jlj-i'^). Components and 1^'^ are identical 

to the messages exchanged with soft-DF-BP by the relay's 
BP decoder. That is, they are functions of the relay channel 
observation y 2 . The expressions by which they are computed 
are obtained from (O and ( fTOb (replacing t/j, and ljv with 

V2,i, r^'^ and Vfr^). Components rf^ and if^ parallel 
messages exchanged with soft-DF-BP by the destination's BP 
decoder, but are not identical to them, n-' is computed 



using (1241 ). 1^'^ is computed using ( [Tol l (replacing r$ and 

l,v with r„.^ and 1,-? ). While this expression for l-?'^ is 
identical to the one used with soft-DF-BP, the dependence on 

(3 £) 

r:,j ; means that it too differs from the equivalent soft-DF-BP 
messages. We briefly summarize this description below. 

Algorithm 3 (Simultaneous Belief Propagation (sim-BP)): 
1) Iterations. Perform the following steps, alternately, a 
pre-determined t times. 
• Rightbound iteration number £ = 0, t — 1. At all 



(2,*) J3,<) 



edges compute a rightbound pair (r. 

in the following way: r^?'^ is computed by (O, 

making substitutions as mentioned above, is 
computed by d24l >. 

Leftbound iteration number £ = 1, At all 
edges (j, i) compute a leftbound pair (1^ )1^ ) in 
the following way: Both components are computed 
by ( [Tol l, making substitutions as mentioned above. 



2) Final decisions. For each i = 1, n compute a 
pair (y| P j,3/3 P j) as follows: yJfj i s computed as in BP 



(0 



(Algorithm [T]), expression ( fTTI ). replacing and 1 

(2 t) 

with j/2, i and 1^' . is computed using the same 
expression, but replacing yi and r.V with y^j ' 2/3,i an< i 
l^j' , where y|i is computed by 

Note that while sim-BP outputs a pair of vectors (y2\y! P )> 
in practice we are only interested in yg p , which is an estimate 
of the transmitted source codeword. The following theorem 
relates it to the output of the destination's BP decoder with 
soft-DF-BP. 

Theorem 2: Consider an instance of communication using 
soft-DF-BP in a stochastic channel setup (Definition [5]). Let 
y2,y3 denote the channel observations, and e 2 denote the 
quantization noise by which y 2 p and y 2 p are related as d20l l. 
Let yff denote the output of BP at the destination. Let y'^ 
denote the output of sim-BP when provided with precisely the 
same vectors (y 2 ,y3,e 2 ). Then y'^ is degraded with respect 
to yjf in the sense of Definition [3] 

By this theorem, we can use sim-BP to upper bound the 
erasure rate at the output of soft-DF-BP at the destination. 
The theorem makes intuitive sense, because the messages 
{r^' f '}, which sim-BP uses in ( l24l i. are intermediate values, 
computed in the process of BP, while components yjf, which 
the destination's BP decoder uses in ( f22l . are final decisions, 
whose quality is expected to be better. This argument will be 
made rigorous in Appendix IIII-AI 

Sim-BP preserves the essential features of BP which make 
its analysis using density evolution possible. Namely, it is a 
message-passing algorithm, and its inputs (y 2 , y3, e 2 ) are vec- 
tors of independently distributed components (conditioned on 
the transmission of the all-zero codeword). Simultaneous den- 
sity evolution (sim-DE) tracks the quantities pj£ (2^,2:3) and 
P^ \x2, X3), corresponding to the joint probability functions 
of message pairs (R^'^,R^'^) and (L^\l^'^) (upper- 
case denotes random variables), respectively. 



The inputs to sim-DE are a triplet (62,63,62) and a pair 
(A, p), that characterize the stochastic relay channel (Defini- 
tion|5J and the LDPC code (Sec. lII-Eb . respectively. The details 
of the algorithm are provided in Appendix lIII-BI The algorithm 
concludes by outputting P' Fmal )(a; 2 , £3), corresponding to the 
distribution of a final decision pair (Y^, Y^), We let pj Flllal - ) 
denote the probability that = e, as evaluated by computing 
the appropriate marginal distribution from 
The following theorem relates this value to the performance 
of sim-BP. 

Theorem 3: Consider communication in the stochastic 
channel setup (Definition |5j. Let (Y 2 , Y3,E 2 ) be the random 
relay and destination channel observations and the quantization 
noise. Assume the code C used was selected at random from a 
(A, p) LDPC ensemble of block length n (see Sec. III-Eb . Let 
(Y 2 P , Yg p ) denote the output of sim-BP, when provided with 
(Y 2 , Y3, E 2 ) as inputs, and applied to the Tanner graph of C. 
Then for any e > and large enough n, the following holds, 

Pr [ P e (Y pp ) - P e ( Final ) > e j < e -^ 2 ™ 



12 



SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY 



where, pj Fmal ) is as defined above, P e (Y| p ) is as in Defini- 
tion 12 and /3 > is some constant, which is independent of 
n and e. 

The theorem implies that the erasure rate at the output of 
sim-BP approaches sim-DE's prediction in probability, expo- 
nentially in n. The proof follows in direct lines as the proof 
of [47, Theorem 2] and is omitted. 

Combined with Theorem [2] the theorem gives us an upper 
bound on the erasure rate at the output of soft-DF-BP. In the 
context of the analysis strategy of Sec. IIII-CI this addresses 
the first of the two conditions of Theorem [TJ 

F. Analysis of the Quantization Noise E2 

We now examine the second condition of Theorem QJ In 
our analysis in Sec. IV-AI this condition will determine the 
level 82 of the quantization noise, which we will choose as 
the minimal value still satisfying (TJTJl. 

In the evaluation of /(Y| p ;Yf | Y 3 ) (the left hand side 
of (TJTJ)), the distributions of the variables involved (see 
Sec. IIII-C1 > are implicitly functions of the code C in use, 
through their dependence on the transmitted Xi, which is 
randomly distributed in C. When evaluating /(Y!f; Y| p | Y3), 
the code C is assumed to be fixed. In our analysis, however, 
C is randomly selected from an ensemble (see Sec. Ill-El l. The 
value of 7(Y!f; Yf | Y3) is thus a random variable whose 
value is determined by C. 

We begin with a naive bound on /(YJfsYSf | Y3). This 
bound closely resembles the left hand side of (1 1 8b . which was 
the evaluation of /(I2; Y2 Y3) in the context of communication 
using CF. The proof of the lemma is provided in Appendix II VI 

Lemma 1 (Naive bound): Let C be selected at random from 
a (A, p) LDPC ensemble with block length n. Then the 
following holds for large enough n, with probability at least 
1 — cxp(— an 1 / 3 ) (the probability being over the random 
selection of C), 

h(Y^;Yf\Y 3 )<h(6fo5 2 ) + 

+(1 - Sf o 5 2 ) ■ 5 3 - h(S 2 ) ■ (1 - if) + o(l) (26) 

where is the expected erasure rate at the output of the 
relay's BP decoder (Sec. IIII-Bb as computed by density evo- 
lution (Sec. IIH-Dl . o(l) is some function of n, dependent on 
A, p and t (the number of relay BP iterations) that approaches 
zero with n. a > is a constant similarly dependent on A, p 
and t. 

The above bound, however, does not exploit strong dependen- 
cies that often exist between the components of the vector 
YSf (as mentioned in Sec. IIII-Db . which result from the 
simple structures of LDPC codes. The following theorem 
exploits these dependencies to produce a stronger bound. 
Unlike Lemma [TJ our bound in this theorem applies to right- 
regular LDPC codes only (see Sec. Ill-El l. 

Theorem 4: Let C be selected at random from a right- 
regular LDPC ensemble (A, d) with block length n. Then the 
following holds for large enough n, with probability at least 
1 — exp(— cm 1 / 3 ), 

-I(Yf;Yf\Y 3 ) < I+(S 2 )+o(l) (27) 



where a and o(l) are defined as in Lemma [TJ 



I+(S 2 ) =l.d.f. 



(28) 



where l.d.f. [/] denotes the largest descending function that is 
upper bounded by /(•). That is, 



l.d.f.[/](x) = inf/(t) 

t<X 



and, 



A(6 2 ) + h(5f o 62) 

Mh) + f(si r ) + (i-s : 



2 ) • Hh 



(29) 
(30) 



where S 2 P is defined as in Lemma [TJ A(5 2 ) and f(a) are 
provided by equations (TSTJ and ( [321 on the following page. 

The proof of the theorem is provided in Appendix [VI In the 
proof, we make use of the following dependencies between 
the components of Y| p . 

1) Dependencies between bits discovered by BP: Each 
component Y 2 \ that corresponds to a bit that was erased by 
the channel but discovered at some BP iteration, is dependent 
and equal to the sum (modulo-2) of d — 1 components 
Y£\ , other bits. This is best seen by examining the 
following simplified algorithm, due to Luby et al. [36, Algo- 
rithm 1], which is equivalent to BP. Like BP, this algorithm is 
iterative, and relies on the Tanner graph representation of the 
LDPC code. However, it is not a message-passing algorithm. 

Algorithm 4 (Simplified-BP): 

1) Initialization. Set the value of each variable node to the 
channel output. 

2) Iteration I — l,...,t. Perform the following operation 
simultaneously at all check nodes: At check node j, if 
the values at all but one of the adjacent variable are 
known (not erased), set the remaining unknown variable 
node to the modulo-2 sum of the others. 

In [48, Problem 3.10] this algorithm was shown to yield 
precisely the same output as BP. Therefore, we may assume 
without loss of generality that it is the one applied by the 
relay of soft-DF-BP. Each bit i uncovered by this algorithm 
(and by extension, BP) is clearly equal to the sum of d — 1 
bits ii,...,id-i that were revealed in previous iterations, or 
by the channel. Equivalently stated, the entropy of Y 2 \, given 
Y 2 \ tl ... : Y 2 \ di is zero. A good lossy compression scheme 
for YSf can exploit this and spend less rate to describe Y 2 \, 
under the assumption that with high probability, the decoder 
will have access to all Y 2 \^ Y%\ (this is made rigorous 
in Appendix [yj. 

2) Dependencies between erasures at the output of BP: 

To adequately communicate Y| p to the destination (or its 
distorted version, YJf, see Sec. IHI-Bl i. the relay must be 
able to approximately convey the locations (time indices) 
of Y!f components that are equal to erasure. Unlike the 
output of a memoryless BEC, these locations are not arbitrary. 
Di et al. [14, Lemma 1.1] proved that they correspond to 
a stopping set of the code C (see [14] for its definition). 
Typically, the number of stopping sets of a given size is sig- 
nificantly smaller than the number of similar-sized (arbitrary) 
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A(5 2 ) = 
/(«) = 

7 = J2 i -~ x 



,);[]-<),) (I - I + (& ~ 6?) (l — (1 — '"»_> 1 



d-1 



max 

/3G(0, 7 



log inf 

x>0,y>0 



log inf 

x>0 



(1 



[(1 + x) d - d ■ x] 



(l-R) 




(31) 
(32) 



subsets of {l,...,n}. Thus, for the relay to describe to the 
destination the locations of the erasures in Yf, it can conserve 
rate by providing the serial number of a stopping set (given 
some enumeration of the stopping sets) rather than describing 
the precise indices. This is exploited by the bound implied 
by d30T > above. 

In Fig.l9l(Sec. lV-Ab we have plotted the bounds of Lemma[TJ 
and Theorem [4] for a specific LDPC code ensemble, as well 
as a bound that applies to "good" codes. A discussion of the 
bounds is provided in that section. 

G. Limitations of "Good" Codes 

In Sec. IIII-AI we mentioned i?DF and Rqf as benchmarks 
for the performance of "good" codes. Unlike our analysis in 
Sec. IIV-CI (in the context of interference channels), we were 
not able to obtain tight bounds that apply in general to any 
relay strategy that involves "good" codes. In this section, we 
nonetheless examine soft-DF-BP, and point out limitations on 
its components, when the code C it relies on (see Algorithm^ 
is taken from a sequence of "good" codes. We also provide 
bounds on the achievable rates in this setting, which hold under 
certain plausible assumptions. We conjecture that in practice, 
soft-DF-BP, as well as other relay strategies, are restricted by 
similar bounds, when confined to "good" codes. 

In our analysis below, we distinguish between two ranges 
for the code rate R: R < 1—62 and R > 1 — 62. In the 
first range, R is below the capacity of the source-relay link. 
As we will see below, in this range "good" codes enjoy an 
advantage over "bad" ones, in terms of soft decoding at the 
relay. Specifically, complete decoding is possible, and thus 
methods like DF (see Sec. IIII-Al i. which involve decoding at 
the relay, are also possible. However, the rates in this range 
are limited. We define the upper bound, 



-Rdf-i 



1 



(33) 



to be the maximum rate in this range. 

Our main focus is on the second range, R > 1 — 62- In this 
range, R exceeds the capacity of the source-relay link, and 
thus complete decoding at the relay is not possible. Methods 
like CF, however, which do not involve complete decoding, 
are potentially possible. In our development of soft-DF-BP, in 
Sec. IIII-BI our objective was to improve upon CF. We will 
argue that this is unlikely to be possible when soft-DF-BP is 
used with "good" codes. 

We begin by considering soft decoding, as applied by soft- 
DF-BP at the relay. Our analysis will rely on the following 
theorem, which examines the estimation error at a destina- 
tion node of a point-to-point BEC. The theorem focuses on 



Pmap(C; S), the expected bit erasure rate (Definition |5} at the 
output of a maximum a posteriori (MAP) decoder for a linear 
codef^l C, when used over a BEC(<5). 

Theorem 5: Let {C n }^ =1 be a sequence of linear codes, of 
rate R, which is "good" for the BEC (see Definition HJ. Let 
<5* = 1 — R be the BEC Shannon limit for rate R (Definition [TJ. 
Then the following holds, 



lim F M ap(C„; 5) 



S, 
0, 



S > 6*; 
S < S*. 



V5e [0,1],5^5* 



(34) 



The results of this theorem resemble the ones that were 
presented in Fig.Q] Specifically, at high values of 6 (paralleling 
low SNRs in Fig. [TJ, MAP decoding of "good" codes col- 
lapses, and its output closely resembles the raw channel signal 
at its input. The proof of the theorem is a variation of the proof 
of [45] [Equation (14)] and is provided in Appendix IVI-AI 
It relies on the relationship between mutual information and 
input estimates, which was recently discovered in several 
contexts (see Palomar and Verdu [44], Measson et al. [39] 
and Ashikhmin et al. [2]). 

The setting of Theorem [5] is more general than required 
for an analysis of soft-DF-BP. It applies to arbitrary "good" 
linear codeo, rather than "good" LDPC codes, and to MAP 
decoding (estimation) rather than BP decoding. The applica- 
tion to soft BP decoding of "good" LDPC codes at a relay, is 
obtained as a corollary, in the following way. We will apply 
the theorem to examine the source-relay link. As noted above, 
our focus is on rates in the range R > 1 — 62, or equivalently 
on erasure probabilities 62 > 1 — R = S* (where 6* is as 
defined in Theorem[5]l. This is precisely the range where MAP 
estimation of "good" codes collapses. By the optimality of 
MAP decoding, the expected erasure rate at the output of BP 
cannot be lower than -Pmap(C;^2), and thus BP estimation 
collapses too. 

It is still possible that BP estimation achieves a reduction in 
the number of erasures which is .wfo-linear in the block length, 
and that this reduction produces a meaningful benefit. We 
conjecture that this is not the case. A more detailed analysis 
is deferred to later work. Formally, we make the following 
conjecture. 

"MAP decoding of linear codes over the BEC produces vectors over the 
alphabet {0, l,e} (see e.g. [48, Sec. 3.2.1]). More precisely, the bitwise a- 
posteriori probability Px|y(1 |y) on which it relies can be shown to belong 
to the set {0, 1, 1/2}, indicating complete confidence (0 or 1) in the decoded 
bit or complete lack of it (1/2). In some formulations of the algorithm, a 
random decision is made when -Pxj|Y(l I y) = 1/2- In this paper, we 
assume 1/2 is mapped to e, producing the desired alphabet. 

20 Our choice to focus on linear codes was made for simplicity. We believe 
that the result can easily be extended to arbitrary "good" codes. 
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Conjecture 1: Consider the asymptotic performance of soft- 
DF-BP, assuming that the code C in Algorithm [2] was taken 
from a sequence of "good" codes. Then an analysis that 
assumes that the output YJf of BP at the relay is identical 
to the channel output Y2, involves no loss in generality. 
Similarly, we may assume Yif = Y2, where the components 
of Y2 and Y!f are given by ( fT6] > and 120) . respectively. 

We now proceed to consider Wyner-Ziv compression as 
applied by soft-DF-BP at the relay. By our discussion in 
Sec. IIII-CI (following Theorem [TJ, in our analysis of soft-DF- 
BP we select the quantization noise level by minimizing 82 
subject to ( 1211 1. In Sec. lIII-Al the left hand side of ( fT51 ), which 
is the CF equivalent of ( f2Tb . was evaluated to equal the left 
hand side of ([18) (subject to (116)), In Theorem g] (Sec. IIITTT i 
we have seen that with "bad" codes we can do better, and 
the left hand side of (fJT) is upper bounded by a value that 
is often lower than the left hand side of ( TT8l . We now wish 
to show that "good" codes do not exhibit a similar advantage. 
This is partially addressed by Theorem [6] below. Note that 
in this theorem we focus on 1/n ■ /(Y2; Y2 | Y3) rather than 
l/n-I(Yf ; Yf |Y 3 ) as in <HD, following ConjectureQ]above. 

Theorem 6: Let 82 > 0, and let {C n }^ =1 and R be defined 
as in Theorem [5] Assume, 



R = 1 - (<5 2 o 8 2 ) ■ 5 3 



(35) 



Then the following holds: 



1 



I(Y 2 ; Y 2 I Y 3 ) = h(S 2 o 8 2 ) + 
11 

+ (1 - 82 o 82) ■ S 3 - h{8 2 ) ■ (1 - 82) + o(l) (36) 

where o(l) approaches zero with n (n being the block length 
of C n , see Remark [T). 



The proof of the theorem is provided in Appendix I VI-B I 

We would now like to determine the rates that are achievable 
with soft-DF-BP, when confined to "good" codes. While 
Theorem Q] (Sec. IHI-Cb provides a set of achievable rates, 
the conditions of the theorem have only been proven to be 
sufficient, but not necessary for a rate to be achievable. We 
nonetheless make the following conjecture. 

Conjecture 2: The conditions of Theorem Q] are necessary 
as well as sufficient for a rate to be achievable with soft-DF- 
BP. 

The conditions of Theorem Q] are closely related to the 
conditions of CF (H) and (Q~5) (Sec. IIII-AI) . Specifically, 
the first condition of the theorem involves performance over 
the (82,83,62) stochastic relay channel (Definition [5), whose 
outputs are Y2 and Y3 (where we have substituted Y| p by 
Y2 as explained above). The achievable rates in this setting 
are bounded by the capacity of the channel from Xi^ to 
(Y 2 , Y3), which is 1 — (<S 2 ° 82) • 83- The first condition of 
Theorem Q] thus implies ( fTT) (which was evaluated from ( [T4l >. 

2 'Note that in this discussion, we focus on performance in terms of frame 
errors (rather than bit errors). In the context of Theorem [T] this is equivalent 
to setting e = 0. 



see Sec. IIH- A) . Following ConjectureQ] we replace the second 
condition with the following inequality, 



1 



-•/(Y 2 ;Y2|Y 3 )<C 



(37) 



By the above discussion, the rates achievable with soft-DF- 
BP, when confined to "good" sequences of codes, subject to 
Conjectures Q] and [2] are upper-bounded by, 



i?UB = SUp 

"good" {C„} 



R = R {Crl } ■■ 35 2 G [0,1] : 
R < 1 - (62 o 8 2 ) ■ 83, 
(137) holds for large 
enough n. 



(38) 



The supremum is over all "good" code sequences, and R{c n } 
denotes the rate of the sequence {C n }^ =1 . 

Finally, in Appendix IVI-CI we apply Theorem [6] to show 
that i? UB is upper bounded by i?cF- in Sec. IV-AI we will rely 
on -Rdf-ub an d Rcf as benchmarks for the rates that are 
achievable with soft-DF-BP, when confined to "good" code 
sequences. 

IV. Coding for the Symmetric BIAWGN 
Interference Channel 

A. Channel Model and Achievable Strategies 

Fig. [6] depicts the (h, a) symmetric BIAWGN interference 
channel. The channel transition probabilities are defined by 
the following equations, 



Y X = Xi + h ■ X 2 + Zy 

Y 2 = h-X 1 +X 2 + Z 2 



(39) 



where Y\ and Y2 are the channel outputs at the two destinations 
(respectively), X\ and X2 and are the transmitted signals. 
Unlike typical formulations of AWGN interference channels 
(e.g. [18]) we restrict X\ and X 2 to {±1}. Z\ and Z 2 
are statistically independent zero-mean real-valued Gaussian 
random variables with variance a 2 , whose realizations at 
different time instances are also independent, h and a are 
positive constants, known to all nodes, and h is restricted to 
the range h G (0, 1) (i.e. we are interested in weak interference 
scenarios [11]). 



Source 1 




Destination 1 



Source 2 Destination 2 

Fig. 6. The symmetric BIAWGN interference channel. 

We define achievable strategies for this channel in the stan- 
dard way (e.g. [18, Sec. II]). Theoretic analysis of interference 
channels typically focuses on the capacity region, i.e. the set 
of pairs (R 1 ,R 2 ) such that communication at rate Ri (resp. 
i? 2 ) is achievable between the first (resp. second) source- 
destination pair. In this paper we confine our attention to 
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achievable symmetric rates. That is, values R > such that 
the pair (R, R) is achievable. In our discussion, the following 
terminology will be useful. 

Definition 6: When considering one of the two destinations 
of an interference channel, we refer to the corresponding 
source, which produced the message that the destination must 
decode, as the primary source. The other source is called the 
interfering source. 

Two sub-optimal communication strategies that frequently ap- 
pear as benchmarks in literature on interference channels, are 
multi-user detection (MUD) and single-user detection (SUD). 
With MUD, each destination attempts to decode the interfer- 
ence, as well as the desired primary signal (i.e. produced by 
the primary source). With SUD, each destination treats the 
interference as noise, alongside the channel noise. Achievable 
rates with both strategies can be evaluated from the following 
expressions. 

i?Muo = min(/(X 2 ;Yi \Xi), ^I{X 1 ,X 2 ; Yi)) (40) 



i?suo = I{Xi\Yi) 



(41) 



While in general, the distributions PxA^i) of X\, and 
Px 2 { x 2) of X2, are parameters to be optimized, in this 
paper we confine our attention to Xi and X2 are uniformly 
distribution in {±1}. With these choices, d40b and d4Th can be 
evaluated numerically. Note that X\ and X 2 are assumed to 
be independently distributed. Additionally, on the right hand 
side of (HTI) . the distribution of Y\ is implicitly a function 
of Px 2 ( x 2) ( a s well as Py 1 (xi)) by virtue of d39l >. The 
expressions d40b and (RTt . are easily obtained from analysis 
of multiple-access channels [13, Theorem 14.3.3], and point- 
to-point (single-user) channel capacity [13, Theorem 8.7.1], 
respectively. 

The strategies described in [13], to achieve rates (l40t 
and (HTl i. involve randomly generated codes (according to a 
uniform distribution in {±1 j), which are "good" for the point- 
to-point BIAWGN channeo- Furthermore, in Sec. IIV-CI we 
will show that they are in fact the best that can be done in 
communication with any "good" code sequence. In Sec. IV-BI 
we will thus use them as benchmarks for communication using 
"good" codes. 

B. Definition and Analysis Framework for Soft-IC-BP 

We assume the two sources use LDPC codes, whose iden- 
tities will be discussed later. Soft-IC-BP is applied at each of 
the two destinations of the interference channel. 

At each destination, soft-IC-BP attempts to decode the 
primary codeword (see Definition^, and produces an estimate 
of the interference as a byproduct. As noted in Sec. II-CI the 
algorithm coincides with iterative-MUD as defined e.g. by [6], 
[1], [51] (and references therein). Unlike standard applications 
of iterative-MUD, however, in applications of soft-IC-BP we 
tolerate a large error in the estimation of the interference (but 
not the primary codeword). 



We now provide a general overview of soft-IC-BP and of 
its analysis methods. A complete discussion is available in 
the above-mentioned references. Our description is intended 
to point out specific features of our implementation, and also 
as background for the optimization of LDPC codes, which will 
be discussed in Sec. IV-BI and Appendix IVIII-BI 

It is convenient to perceive the operation of soft-IC-BP at 
each destination, as the parallel operation of two decoders, the 
first decoding the primary codeword, and the second estimat- 
ing the interference. The two decoders iteratively exchange 
information to improve their respective performance. At each 
iteration, the information each decoder obtains from the other, 
assists it in better canceling the signal produced by the other 
source, in order to better estimate its own signal. 

More precisely, soft-IC-BP progresses through the exchange 
of messages between the nodes of a factor graph [33], 
which represents the communication setting (see Fig. [7]) at 
the destination. This graph contains the Tanner graphs of the 
primary and interference LDPC codes (see Sec. | II-Et . as well 
as additional nodes, including n state nodeo. Each state 
node corresponds the received signal at one time instance. 
It is linked to one variable node from each Tanner graph, 
each corresponding to a transmitted bit from one of the two 
sources at the given time instance. Decoding includes standard 
LDPC decoding iterations (see e.g. [47]) as well as variable- 
to-state and state-to-variable iterations, which implement an 
exchange of information between the two Tanner graphs. For 
precise details regarding the computation of the messages 
see [51][Sec. 2]. 



State node 




Tanner graph of 
primary LDPC 



Tanner graph of 
interference LDPC 



22 More precisely (as in Sec. IIII-At . a sequence of codes generated in this 
way is "good" with probability 1, the probability implied by their random 
generation. 



Fig. 7. An example of the factor graph for an application of soft-IC. 

In this paper, we have adopted a number of attributes of 
the design of [51]. Namely, we have assumed that the LDPC 
codes used by the two sources have the same block lengths and 
edge distributions (A,p). Under this assumption, the number 
of nodes of any given degree within the Tanner graphs of 
the codes is the same. We further assumed that the nodes are 
arranged so that the two variable nodes that are linked to each 
state node have the same degree. In [51, Sec. 4] this is known 
as the no-interleaver hypothesis. Lastly, we assumed parallel 

23 In [51] they are called "state-check" nodes. 
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scheduling. This means that decoding iterations at both Tanner 
graphs are computed in parallel. 

Analysis of soft-IC-BP is possible by an application of 
density evolution, similar to the one that was discussed in 
Sec. IIII-DI in the context of BP over the BEC. Once again, 
density evolution tracks the distributions of messages ex- 
changed at the various iterations of soft-IC-BP. In addition 
to rightbound and leftbound LDPC iterations, the algorithm 
tracks the distributions of variable-to-state and state-to-variable 
messages. A distinction is made between the messages ex- 
changed in the Tanner graph of the primary codeword, and the 
messages in the graph of the interference, whose distributions 
are expected to be different. Unlike BP over the BEC, the 
messages of soft-IC-BP are taken from a large, continuous 
alphabet (the real number field), and so their distributions as 
tracked by density evolution are defined over this alphabet 
(more precisely, a fine grid over the real-number field as [47]). 

A detailed discussion of density evolution for iterative-MUD 
is available in [6, Sec. IV.A] and [48, Sec. 5.5]. Like [48], 
our computation of the evolution of distributions through 
variable-to-state and state-to-variable iterations is performed 
precisely, rather than by Monte-Carlo simulations as in [6]. 
Unlike [6, Sec. IV.A] and [48, Example 5.34], our reliance on 
the above-mentioned no-interleaver hypothesis implies that the 
degrees of the variable node linked to each state node are not 
independent (in fact they are equal). In our implementation 
of density evolution, we account for this by considering the 
variable-to-state, state-to-variable and following rightbound 
iteration as a combined single iteration. 

As in the case of standard density evolution, a concentration 
theorem exists [6, Proposition 1] that asserts that the realized 
bit error rate, with both the primary and interference code- 
words, approaches density evolutions's prediction in probabil- 
ity, exponentially in the block length n. 

C. Limitations of "Good" Codes 

We now turn our attention to limitations on the performance 
of "good" codes. In Sec. IIV-AI we mentioned that the MUD 
and SUD achievable strategies both rely on communication 
with "good" codes. The following theorem shows that their 
achievable rates ( d40b and (HTI) ) bound the performance of 
any communication strategy that relies on any set of "good" 
codes. These results are stronger than the ones we obtained in 
Sec. IIII-GI in the context of erasure relay channels. 

Theorem 7: Consider communication over a symmetric BI- 
AWGN interference channel. Assume the two sources use 
equal block length codes taken from "good" code sequences 
{Ci,n}^Li an d {^2,n}^=i> respectively, which have rate R 
(see Sec. IH-Dl i. Assume the probabilities of decoding error, 
under maximum-likelihood decoding, at both destinations, 
approach zero with the block length n. Then the following 
holds, 

R < max ^i?MUDi -Rsud^ (42) 

The proof of this theorem is provided in Appendix IVIII 
The proof builds on the converse of the capacity theorem 
of multiple-access channels, see e.g. [13, Sec. 14.3.4]. For 



example, consider the setting facing Destination 1. Once the 
destination has decoded the primary codeword (see Defini- 
tion |6), it is able to subtract it. The remaining signal is 
equivalent to the output of a point-to-point BIAWGN channel, 
whose input is the interference X2. If R is lower than the 
capacity of this channel, then by the "goodness" of {C2, n }5£Li, 
complete decoding of the interference is possible. Thus, the 
communication setting in this case resembles a multiple-access 
scenario, leading to the bound R < Rmvd (see Appendix IVIII 
for the rigorous details). 

If R is greater than the capacity of the above point-to-point 
BIAWGN channel, then complete decoding of the interference 
X2 is not possible. However, relying on the "goodness" of 
{C2, n }^Li, we 316 still able to bound its entropy given the 
channel output Yi and the primary codeword. We apply this 
bound in Appendix IVIII to show that in this case, R < i?suD 
must hold. 

Remark 3: Note that in communication using a code C, we 
mean that the source simply maps each message to a codeword 
of C, and does not manipulate C, e.g. by combining it with 
another code, as in the Han-Kobayashi strategy [25]. 

Finally, in Sec. IV-BI below, we provide examples of simple- 
structured "bad" LDPC codes, which are capable of commu- 
nication at rates that exceed -Rmud and -Rsud, thus surpassing 
the performance of "good" codes. 

V. Code Design Methods and Numerical Results 

A. Erasure Relay 

As noted in Sec. IIII-CI application of soft-DF-BP to a 
(82, S3, C ) erasure relay channel involves specifying the edge 
distributions (A, p) for the LDPC code, as well as the quantiza- 
tion noise level 82, such that the conditions of Theorem Q] are 
satisfied (for e which will be specified later). Given parameters 
(A,p, 82) °f such an application, we use sim-DE, as defined 
in Sec. IIII-EI to verify that the first condition of that theorem 
is satisfied, and the bounds of Theorem |4] to verify the second 
condition. Our objective is to maximize the communication 
rate, as measured by the design rate (0 corresponding to 
(A,p). As benchmarks for comparison, we use -Rdf, Rcf 
and i?DF-UB, defined by ([T3l , ( fl9] l and P3t . following our 
discussions in Sections IlII-AI and IIII-GI 

To design effective soft-DF-BP parameters, we applied 
a semi-heuristic hill climbing algorithm based on Richard- 
son el al. [49, sec. IV.A] and [48, Example 4.139]. Once 
the parameters were obtained, they were verified by the non- 
heuristic methods described above. Our algorithm starts with 
an initial admissible (A, p, 82) triplet, i.e. one that satisfies the 
conditions of Theorem Q] It proceeds by iteratively attempting 
to improve it, so that at each iteration the design rate is 
increased, and the triplet is still admissible. The details of 
the algorithm are provided in Appendix IVIII-AI 

We designed codes for an erasure relay channel with param- 
eters 82 — 0.5, 63 = 0.82 and C = 0.9 (C is measured in 
bits, see Sec. Ill- At . We applied the above design procedure and 
obtained edge distributions (A, p) corresponding to a design 
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rate of R = 0.5056. The parameters of the codes are, 

A 2 , 3 ,4,23,24,ioo = (0.2289,0.04532,0.2361,0.233, 

0.03178,0.2249), p w = 1, 8 2 = 0.212 (43) 

As the degrees of the check-nodes in such (A, p) codes are 
bounded (and equal to 10), code sequences corresponding to 
(A, p) are "bad" for the point-to-point BEC channel By 
Theorem|4] the right-hand-side of (|2TT i is upper bounded by 0.9 
for large enough n, and so the second condition of TheoremQ] 
is satisfied. The erasure rate at the output of soft-DF-BP, as 
predicted by sim-DE, is upper bounded by e = 1.54 • 10~ 5 in 
probability with the block length n. By Theorem[U this means 
that the achievable rate with such codes is R = 0.5053. 

Our benchmarks (see above) for this channel are i?DF = 
0.5, Rcf = 0.49867 and -Rdf— ub = 0.5. All benchmarks 
were surpassed by the above rate R which is achievable 
by soft-DF-BP. By our discussion in Sec. IIII-GI Rqf and 
^df-ub are likely to upper bound the achievable rates with 
any implementation of soft-DF-BP that applies "good" codes. 
Thus, our gap from the benchmarks indicates the advantage 
of using "bad" codeQ 

It is interesting to examine additional aspects of the per- 
formance of the "bad" LDPC code sequences corresponding 
the above (A, p) pair, in comparison with the performance of 
"good" code sequences. In Fig. [8] we consider communication 
over a point-to-point BEC. This figure parallels Fig. [T] (Sec. |TJ. 
The various curves correspond to the erasure rates at the output 
of estimation algorithms at the destination, as functions of the 
the channel's erasure probability. The first curve corresponds 
to the asymptotic erasure rate at the output of MAP estimation 
with "good" codes of rate 0.5053. The value plotted is the 
asymptotic limit as determined in Theorem [5] The second 
curve corresponds to the the expected erasure rate at the output 
of BP estimation, when applied to communication using a ran- 
domly generated (A, p) LDPC code (where (A, p) are defined 
by (l43l). as predicted by density evolution (see Sec. IIII-Dl i. 
The last curve corresponds to uncoded communications, where 
estimation cannot improve upon the raw channel output. 

The erasure probability of the source-relay link of the above- 
mentioned (62, 83, C ) channel is 0.5. At this point, the "good" 
codes curve evaluates to 0.5, while the (A, p) LDPC curve 
achieves an erasure rate of 0.3016. Thus, with (A,p) LDPC, 
soft-DF-BP forwards a much better signal y| p (see Alg. O to 
the destination, than when "good" codes are applied. 

Another factor that affects the achievable rates at the des- 
tination, is the level of quantization noise 82- In Fig. [9] we 
have plotted the required 82 as a function of the capacity C a 
of the relay-destination link. In all curves, 82 and 83 are fixed 
and equal to the values specified above. The first curve in 
Fig. [9] corresponds to communication using "good" codes, and 

24 This follows from the discussion of Sec. II-BI by observing that in the 
parity-check matrix that corresponds to the Tanner graph of such (A, p) LDPC 
codes (see e.g. [8, Sec. II. A]), the average weight of each row is 10. 

25 We also experimented with partial-DF (see Sec. II-At . Unfortunately, we 
were not able to design an application of the strategy whose performance 
exceeds the above rates achieved by DF and CF. A similar difficulty was 
reported by [31, Sec. 4.2.7] in the context of full-duplex AWGN channels. 
Further optimization of partial-DF is beyond the scope of this work. 



was computed by minimizing 82 subject to ( TT~8T >. 82 equals the 
level of noise in applications of CF (see Sec. IIII-Ab . By the 
discussion in Sec. IIII-GI it is also likely to be a lower bound 
on the noise level in applications of soft-DF-BP that involve 
"good" code@ The second curve in Fig. [9] corresponds to 
communication using the a randomly selected code from the 
(A, p) LDPC ensemble (where (A, p) are defined by d43l). and 
was computed by minimizing 82 subject to I + {82) < C , 
where I + {82) is given by d28l l. The third curve corresponds to 
the naive upper bound of Lemma [T] and is similarly computed 
using d26l i (ignoring the o(l) term). 

The capacity of the relay-destination link in the above- 
mentioned (82, 83, C ) channel is 0.9. At this point, the "good" 
codes curve evaluates to 8 2 = 0.223, while our LDPC codes 
require at most 82 = 0.212. Thus, communication of the signal 
y| p from the relay to the destination, is possible with a lower 
level of distortion when our above "bad" LDPC codes are used, 
than when "good" codes are applied. It is also interesting to 
observe the performance of the curve at higher values of C Q . 
With our above LDPC code, lossless (82 — 0) compression is 
possible when C — 0.9463. The "good" codes curve requires 
Co = 1.41. 




0.1 - 
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Fig. 8. Erasure rates as functions of the BEC erasure probability 8. 



B. Symmetric BIAWGN Interference 

An application of soft-IC-BP for a (h, a) symmetric BI- 
AWGN interference channel involves specifying the edge 
distributions (A, p) for the LDPC codes in use. As noted in 
Sec. IIV-BI we assume both sources use the same edge distri- 
butions. We use density evolution, as discussed in Sec. IIV-BI 
to verify the codes' performance. Our objective is again to 
maximize the design rate (8), corresponding to (A, p). As 
benchmarks for comparison, we use -Rmud an d -Rsudi as 
defined by |@0) and gB in Sec. HV^Al 

26 Strictly speaking, the level of quantization noise may be lower, because 
Theorem [6] requires R = 1 — (82 o 82) ■ 83 while communication may 
be possible if _R < 1 — (#2 $2) • S3. However, by the discussion in 
Appendix IVI-CI the achievable rates remain unaltered even if we assume 
the curve indeed lower bounds the required 82- 
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Fig. 9. Quantization noise &2 as a function of the capacity of relay-destination 
link C . 



As in our above discussion of soft-DF-BP, we apply a 
variation of the semi-heuristic hill-climbing algorithm of [49] 
to design effective codes. In adapting the algorithm, special 
care was taken to avoid attraction to local maxima that are 
particular to our setting. The details of the algorithm are 
provided in Appendix I VIII-BI 

We designed codes for a symmetric BIAWGN interference 
channels with parameters h = 0.839 and a = 1.075. We 
applied the above design procedure and obtained edge dis- 
tributions (A, p) below. 

A2,3,io,ii,55,56,57 = (0.2949,0.2036,0.05943,0.0001219, 



0.2399, 0.09542, 0.1065), p 6 = 1 



(44) 



Once again, the degrees of the check-nodes in such (A, p) 
codes are bounded (and equal to 6), and so code sequences 
corresponding to (A, p) are "bad" for the point-to-point BI- 
AWGN channel (this follows by the same arguments as in 
Sec. IV-AI above). The bit error rate at the output of soft-IC- 
BP, as predicted by density evolution, approaches 4 • 10 -6 
in probability with the block length n. This figure refers to 
decoding at each destination, of the codeword transmitted by 
the corresponding source. The bit error rate in decoding of the 
codeword sent from the interfering source, approaches 0.062 
with n. By the symmetry of the problem, these figures are 
identical at both destinations. 

Our benchmarks for the above channel are i?MUD = 0.3237 
and i?suD = 0.308 (measured in bits, see Sec. Ill- Al l. By 
Theorem [7] these rates upper bound the achievable rates of 
applications of "good" codes. The design rate for the code 
specified by d44b is 0.3243. This rate exceeds both our bench- 
marks, indicating the potential of "bad" codes. The above- 
mentioned bit error rate of 0.062, in the decoding of the 
interfering codeword, is greater than zero but lower than the bit 
error rate with bitwise decoding (i.e., when the code structure 
is not exploited), which equals 0.301. Thus, partial decoding 
of the interference was achieved. 



As noted in Sec. II-AI the best-known rates for the inter- 
ference channel are achieved by the Han-Kobayashi (HK) 
strategy [25], which like soft-IC-BP, involves partial decoding. 
In Appendix IVIII-CI we describe an application of HK for the 
above channel which is provably capable of communication at 
rate 0.333. This rate exceeds our above application of soft-IC- 
BP. As noted in Sec. II-BI however, this comes at a price with 
respect to practical considerations like decoding complexity. 
We also verify in Appendix IVIII-CI that the codes used by 
HK to achieve this performance are point-to-point "bad". This 
reinforces our insight that "bad" codes have an inherent role 
in methods that rely on partial decoding. 

VI. Conclusion 

Multi-terminal communications poses a much richer re- 
search problem than traditional (point-to-point) communica- 
tions. While coding for point-to-point channels needs only 
consider the performance at the destination, multi-terminal 
channels offer additional degrees of freedom, by enabling 
partial decoding at non-destination nodes (e.g. relays) as well. 

The approach of partial-DF [12] and HK [25] (see Sec. [H, 
which involves manipulating randomly-generated codes, is a 
natural extension of the traditional analysis of point-to-point 
channels. Over such channels, randomly-generated codes were 
the first "good" codes to be found. Soft-DF and soft-IC, by 
comparison, often rely on simple-structured codes, which over 
point-to-point channels have been shown to be suboptimal 
("bad"). In this paper, we have demonstrated that such codes 
may in fact offer benefits in multi-terminal scenarios, which 
are intrinsically related to their point-to-point "badness". Our 
main contribution has been a rigorous analysis of these bene- 
fits, in terms of achievable communication rates. 

Many open problems remain. Most important, in our view, 
are the questions posed in Sec. II-BI Specifically, an analysis of 
the tradeoff between achievable rates and computational com- 
plexity is of great practical interest. Tightening of our bounds 
(e.g. Theorem @]i and refinement of our LDPC optimization 
algorithm (Appendix IVIII-Ab may enable the improvement of 
the achievable rates reported in Sec. [Vj and yield insight on 
the capacities of relay and interference channels. Extensions 
of our results to additional relay and interference channels, as 
well as additional network models, are also interesting research 
problems. Specifically, we conjecture that the significance of 
partial decoding increases with the number of network nodes. 

Our focus in this paper on LDPC codes was guided ex- 
clusively by ease of analysis. Our results, however, open the 
door to a re-evaluation of other "bad" (non capacity achieving) 
point-to-point codes, like simple convolutional, Reed-Muller 
and Reed-Solomon codes, in multi-terminal scenarios. 

Appendix I 
Details of the Curves in Figure [T] 

The MMSE values plotted in Fig. [T] correspond to the 
asymptotic normalized MMSE of a sequence codes. More 
precisely, given a code C , we define, 



mmse(C ; SNR) 



|X(Y)-X|| 



(45) 
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where X is the transmitted codeword, which is randomly 
distributed in C , Y is the channel output at the destination 
and X(Y) is the MMSE estimate of X given Y. Note that we 
assume no distinction is made between information and parity 
bits, and estimation of the values of all transmitted code bits 
is performed. 

The curves in Fig. [T] corresponds to the normalized MMSE 
of sequences of codes. Given a sequence C — {Cn}^ =1 , we 
define its normalized MMSE as, 

mmse (C,SNR) = lim — mmse(C„, SNR) 

The curve in Fig. Q] corresponding to uncoded communications 
was evaluated as [23] [Equation (17)]. In the curve correspond- 
ing to "good" code-sequences, we have defined "goodness" 
as Definition [4] with respect to BIAWGN channels. In the 
range SNR < SNR* where SNR* is the Shannon limit for 
rate 1/2, we have relied on the analysis of [45] [Equation (14)] 
to evaluate the curve. In the range SNR > SNR*, we have 
relied on an analysis similar to the one in Appendix IVI-AI 
with respect to estimation over the BEC, in the range S < 5*. 

The LDPC(2,4) curve corresponds to a sequence of codes 
{C n }^=i, where C n was selected at random from the 
LDPC(2,4) ensemble of block length n (see Sec. llFEl) . 
The bound on the MMSE was explained in our paper [4, 
Sec. III. A]. 

Appendix II 
Proof of TheoremQ] 

We begin by an overview of our proof. Our main tech- 
nique involves focusing on the virtual channel, obtained by 
encapsulating soft decoding, as performed at the relay with 
soft-DF-BP, into the channel model. This channel is identical 
to the relay channel of Sec. IIII-AI with the exception that the 
relay output >2,i at time i is replaced by Y^\- Having encap- 
sulated soft decoding into the channel model, the remaining 
components of soft-DF-BP now closely resemble CF. Thus, it 
would appear that we can apply the results of [12, Theorem 6] 
to analyze its performance. 

However, the application of the above technique involves 
addressing two issues. First, unlike the relay channel out- 
put Y2, the erasures in Yjf are in general not statistically 
independent. This means that the channel from components 
X\_i to is not memoryless, thus violating an assumption 
of [12, Theorem 6]. Second, the proof of [12, Theorem 6] only 
guarantees that Y| p , as obtained at the destination, is strongly 
typical to (f2Qb . Our analysis of LDPC codes will require that 
its actual distribution match (120b . 

These issues are easily addressed by considering a modified 
version of soft-DF-BP that uses a code C*, obtained by 
concatenating C (see Algorithm |2]i with an outer code C ou t- 
Each communication thus involves multiple transmissions of 
codewords from the C (the inner code). C* replaces C at all 
operations of the modified algorithm (e.g. encoding at the 
source), with the exception of soft decoding at the relay, 
which is applied independently to each of the concatenated 
codewords of C. We also assume that the destination applies 



joint-typicality decoding (see [12]) to decode C*, rather than 
BP decoding. 

Analysis is now simplified by examining the transition 
probabilities of the virtual outer channel. The source input 
alphabet of the channel consists of the codewords of C, the 
relay output is Y!f, the relay-destination channel has capacity 
n -C , and the output at the destination is Y3. Communication 
using modified soft-DF-BP involves multiple uses of this 
channel. 

The virtual outer channel is clearly memoryless, thus ad- 
dressing the first issue above. Furthermore, modified soft-DF- 
BP, as defined over this channel, now matches CF as defined 
in [12, Theorem 6]. Analysis of the joint-typicality decoder 
on which it relies is possible with the standard information- 
theoretic techniques used by [12], thus removing the second 
issue above. The analysis of [12, Theorem 6] (as specialized 
in [27], see Sec. IIII-AI above) guarantees that if (f2TT > holds, we 
can select C out such that the rate 

-^modified = — ■ Y 3 , Y| p ) (46) 

is achievable, where we assume that X x is uniformly dis- 
tributed in C, and normalization by n is required because we 
are measuring rate in bits per use of the inner channel. 

Finally, to evaluate d46i >. we apply the analysis of soft-DF- 
BP in the stochastic channel setup. That is, we consider a 
formal scenario, where the assumptions of the setup hold. Re- 
lying on Condition 1 of the theorem, the following inequality 
can now be shown to hold, 

i ■ 7(X i; Y 3 , Yf) >R-(l- h{e/R)) + o(l) (47) 

where R is the rate C and o(l) is a term that approaches zero 
with n. The proof of d47| > relies on concepts similar to the 
proof of the joint source-channel coding theorem (see e.g. [38, 
Sec. 10.5]) and is omitted. The argument e/R to the entropy 
function (rather than e as in [38]) compensates for the fact that 
e is the fraction of erroneous code bits rather than information 
bits as in [38]. 

□ 

Appendix III 
Results for Sec. IIII-E1 

A. Proof of Theorem [2] 

The proof relies on the properties of erasure multiplication 
and addition as defined (O and (|6). Specifically, it is easy to 
verify that if x' is degraded with respect to x and y' is degraded 
with respect to y, then x'+y' is degraded with respect to x + y 
and x' ■ y' is degraded with respect to x ■ y. Similarly, x is 
degraded with respect to x ■ y for all x,y G {0,1, e}. We 
proceed with the following lemma 

Lemma 2: Consider an instance of the application of BP 
(Algorithm [TJ over the point-to-point BEC. Let be a 
rightbound message computed at some intermediate iteration 

£ = 0, t— 1, and y BP the final decision later computed at the 

U) (() 

node i that produced . Then is degraded with respect 
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Proof: We will actually prove a stronger result: ' is 

degraded with respect to for all £ — 1, t — 1, and rj* 1 ' 
is degraded with respect to yf. The desired result will follow 
by the obvious transitivity of degradedness. 

Our proof follows by induction on the iteration number I. 



We start by comparing tSJ and tSJ . By (O, ■ equals the 

channel output while is obtained by multiplying yi 
with some other components. The result now follows by the 
above-mentioned property of erasure multiplication. 

(£—1) (£) 

We proceed to examine rL ; and r\y for £ = 2, ...,t - 1. 
By (O, both are functions of leftbound messages across the 
same edges, but at different iterations ({l^fj ^}j'eJ^(i)\{j} and 
{v]}j'&A/'(i)\{j}> respectively), as well as the same yj. If we 
could prove that each message is degraded with respect 



to the corresponding ly v the result would follow by the above- 
mentioned property of erasure multiplication. Each such left- 
bound message is computed by ( fTOb . Again, both are functions 
of rightbound messages across the same edges, but at differ- 
ent iterations ({4>f}i>eJV(j>)\{i>} and {tfi'^h'GtfU'Mi'}' 

(£—2) 

respectively). By induction, each message , ' is degraded 



(£—1) 

with respect to r^, and the result now follows by the above- 
mentioned property of erasure addition. 

Finally, the proof of the degradedness of ^ with respect 
to y® p follows by similar arguments and is omitted. □ 
We now introduce some notation. By construction, the first 
components x^'^ and lj 2 '^ of the message pairs computed by 
sim-BP are identical to the messages computed by the relay's 
BP decoder with soft-DF-BP. The same does not hold for the 
other components of sim-BP and soft-DF-BP's messages at 
the destination. We let r£?'*' and L-j; denote the messages 
computed with soft-DF-BP at the destination, and r^.' and 



the corresponding components of sim-BP's message 
pairs. 

( 3 £) 

The proof proceeds by showing that r ^ ' is degraded with 



J3,£) 



respect to t^'~' for all £, i,j. With the above notation, r' t] 



,(3,£) 



is computed by d24l >. replacing r^' £ ' and by x'^' and 



(3,*) 



respectively, x\j ' is computed by (|22T >. By Lemma [2] 
is degraded with respect to for all £,i,j. By 



At.) 

ij 

and d25t and the above-mentioned properties of multiplication, 

(2 £) 

it follows that f^-' is degraded with respect to Each 

message I'^f, j' € A/"(i)\{j} can be shown to be degraded 

with respect to 1^ by induction, using similar arguments to 
the ones used in the proof of Lemma [2] above. The desired 



degradedness of r'^' with respect to t\j' z> now follows by 
the above-mentioned properties of multiplication. 

Finally, the proof of degradedness of y'^ t with respect to 
yf i now follows from the above results by similar arguments 
and is omitted. 

□ 



B. Details of Simultaneous Density Evolution 

The description below relies on the discussion of Sec. IIII-EI 
The algorithm is based on the concepts of density evolution 



over point-to-point channels, as described in Sec. IIII-DI Like 
density evolution, it relies on the all-zero codeword assump- 
tion, and thus the distributions P^\x2,xs) and P^' (x2,Xs) 
that it tracks are confined to the range {0,e} x {0,e}. 
Algorithm 5 (Simultaneous Density Evolution (sim-DE)): 

1) Iterations. Perform the following steps, alternately, a 
pre-determined t times. 

• Rightbound iteration number I — 0, ...,t — 1. Set 



P$ = T(Pr) where. 




© (i£>) 



0(»-i) 



£ = 0, 
£>0. 
(48) 



where Ps(-) is defined for 5 £ [0, l],x € {0,e} by, 
Ps(x) = 



S, 



x 

X 



Pg 2 ■ P$ 3 is defined by ( BIT ) on the following page 
and the operation is defined by j52l . P &% = P0 
P ... P, i.e., the repeated application of the 
operation a number i times on P. Addition 
and multiplication by Aj in (l48b are performed 
componentwise (see (f54l>). Lastly, T(-) is defined 
by (ES). 

Leftbound iteration number £ = 1, . 
obtained by, 



t P { L l) is 



P. 



p [ 

J- c 



} \©(i-i) 



(49) 



where the operation © is defined by (1531 1. and where 
Pi and Pi are probability functions over {0, e} 2 . 
P® 1 is defined in the same way as P Ql 

2) Final Decisions. Set p( Final ) = r(P (Fmal) ) where, 



P 



(Final) 



H 0) o 



©i 



(50) 



where Xi is as defined in Sec. III-EI 



Sim-DE follows the same concepts of density evolution as de- 
veloped by Richardson et al. [47] and discussed in Sec. lIII-Dl 
Its computations follow the expressions for sim-BP. Like 
standard density evolution, the incoming message pairs at each 
node, on which the computations for the outgoing pairs rely, 
are assumed to be mutually independent (although the compo- 
nents within each such pair are in general dependent). At vari- 
able nodes, the pairs are also assumed to be independent of the 
node's channel outputs (l2,i) Yz,i, E2,i)- These assumptions 
are justified by similar arguments to the ones in Sec. IIII-DI 
relying on the fact that conditioned on the transmission of the 
all-zero codeword, components {¥2,1, -^2,1) corresponding 
to different indices i, are mutually independent. 

The computations in a rightbound iteration have been sim- 
plified by introducing an intermediate step. Rather than deter- 

(£) 

mine P^ ' directly, the algorithm first computes an auxiliary 



BENNATAN ET AL. : IN PRAISE OF BAD CODES FOR MULTI-TERMINAL COMMUNICATIONS 



21 



P = Ps 2 ■ Ps 3 
P = P l QP 2 

P = P l @P 2 

P = a 1 P 1 +a 2 P 2 
P = T(P) 



E 



P[x 2 ,x 3 ) = Pg 2 (x 2 ) ■ Ps 3 (x 3 ) 
P(x 2 ,x 3 ) 

P(x 2 ,x 3 ) 



Vx 2 ,x 3 G {0, e} 
Pi(x\,x\) ■ P 2 {xl,x\) Mx 2 ,x 3 G {0, e} 



^ 3 j **** 2 ^ 3 ^ 



E 



p i(xl,xl) ■ P 2 {xl,xj) Mx 2 ,x 3 G {0,e} 



- 1 £cJ,a;^a:^G{0,e} 



P(£2, £3) = aiPi(x 2 ,a;3) + a 2 P 2 (a; 2 , £3) 
P(x 2 ,a; 3 )= £ P> 2 ,x 3 )-P^(x 2 ) 

32,£3£{0,e}, 
23 -(2:2 +£2) =13 



Va; 2 ,x 3 € {0,e} 
Vx 2 ,x 3 G {0, e} 



(51) 
(52) 

(53) 

(54) 
(55) 



value Pr, which corresponds to a pair (r^-^ 
rj 3 -^ is defined by, 



r i,i ') where 



V3,i, 

ya,i ■ Hi' 



j'eAf(i)\{j} ' 



^ = 0, 
£ > 0. 



That is, q ; ^ coincides with d241 l. except that the multiplica- 
tion by q . ' is omitted. 



The weighted sums by A;, and Ai in (1481 1. (1491 1 and 
respectively, follow from the random construction of the 
computation graph, and are justified by the same arguments 
as [49, Expression (8)]. 

Appendix IV 
Proof of LemmaQ] 

We begin by writing, 

/(*?; Yf I Y 3 ) = H(Yf \ Y 3 ) - P(Y BP | Y BP , Y 3 ) (56) 

Focusing on the first term on the right hand side of (l56l l. 

H(Yf\Y 3 ) ( => P(Y- E B /|Y 3 ) 

= P(Yr|Er ; Y 3 ) + P(E B /|Y 3 ) 
i P(Yr|E-,Y 3 ) + P(E B 2 p ) (57) 

where in (a), we have defined Elf as a random vector whose 
components are derived from Yf, 



Y 



■2.1 



0, Y?i±e. 



(58) 



To justify (b), we argue that Elf is independent of Y 3 . To see 
this, first observe that by the above definitions and (1201 1. the 
following holds for i = 1, ,.n, 



El 



El 



E 2 



(59) 



where is derived from Y%\ in the same way as E^ i was 
derived from YJ*, and E 2j i is simply the erasure noise over 
the channel, defined as d 1 2t > . By this definition, E!f specifies 
the set of erased indices at the output of the relay's BP, and is 
thus a function of E 2 , the erasure noise on the channel from 
the source to the relay. Both E 2 and E 2 are independent of 
Y 3 , and equality (b) now follows. 



We now bound the first term on the right hand side of d57] >. 



H(Y% I E B /, Y S ) < H $%i I Y 3 ) 

i=l 

1=1 

n 

( => Y, H ( X hi I $%i = 0, Y 3ti ) ■ Pr[P- = 0] 

1=1 

n 
i=l 

x Pr[£^ = 0] 

n 

< x;i-* 3 -(i-<iO&) 



-if 



= n < (5 3 

I L \ i=l 

= n{5 3 ■ [1 - if o j a ] 



(60) 



In (a), we have relied on the fact that if = e > tnen by ' 
F 2 BP . = e with probability 1 and so H(Y%- \E^ = e, Y 3ji ) = 0. 
If E^i = then E^ i — X\i, where X\ i is a component of 
the transmitted source vector, Xi. In (b), we have observed 
that if Y 3 i = x where x G {0,1}, then Xj j = x with 
probability 1, and so H(X lti | Ef A = 0,Y 3ii = x) = 0. In 
(c), we have observed that since X^, is defined over {0, 1}, 
H(X hi I JS^j = 0,Y 3) i = e) < 1. We have also evaluated 
Pr[Y 3 ,- = e] = S 3 . Finally, we defined 77^ = Pr[^ = e]. 
By ( |59l ), and the fact of P 2 ^ is distributed as Erasure(<5 2 ) (see 
Sec. IIII-Q . invoking (TSJl, we have PrfjE^ = e] = ^2- 
Observe that each is in fact a function of the code C, 
which was randomly selected from the (A, p) ensemble. Thus, 
it is a random variable. In (d), we have relied on the following 



22 



SUBMITTED TO THE IEEE TRANSACTIONS ON INFORMATION THEORY 



derivation, 



=1 
1 ™ 



E 



E[D§ 



(d) ^.gp 



(61) 



In (a) we have invoked the definition of jyJfi- I n (b), X^bp _ e 

is an indicator random variable, which equals 1 if Difi = e - 
The expectation is over the channel transitions, but the code 
C is assumed to be fixed. In (c), Dlf is the realized erasure 
rate at the output of BP at the relay (see Definition |2). In (d) 
we have relied on Corollary [TJ (see Appendix IIV-AI below). 
This bound holds for large enough n with probability at least 
1 — cxp(— (3/2 ■ n 1 / 3 ) for j3 > 0, thus complying with the 
conditions of Lemma [TJ 

We now turn to bound the second term on the right hand 
side of BT\ . 



H(E B 2 



< 



E ff (^ 

i=l 
n 



0>) 
< 



n ■ h 




h(5 B 2 e o S 2 ) + o(l) 



(62) 



In (a) is defined as above. In (b) we have applied Jensen's 
inequality, relying on the concavity of the entropy function. In 
(c) we have relied on ( 1611 and invoked the continuity of h(-). 

We now turn to evaluate the second term on the right hand 
side of ( l56b . 



H(Y% 



Y?,Y 3 ) 



( => H(Yf | Yf) 



E H{ y™i 



~V~BP T^BP 

1 2 ) ^2,1) 



T^BP \ 

••7 I 2,i-l) 



- s 

i=l 



i=l 



h(6 2 



1 - - > M- 



i=l 



(63) 



where in (a) we have relied on the fact that the three random 
vectors on both sides of the equation form a Markov chain: 
Y 3 O Yjf o Yi^. In (b), we have simply applied the chain 
rule for entropy. In (c), we have relied on the fact that the 
random variables on the previous line make the following 



Markov chain: Y 8 ,^ o Yf^ <-> *™ O Y™, where Yf^ 
and Y2 ? ^ i are defined as ([TJ. In (d), we have relied on the 
observation that if Y%\ = e, then Y 2 B? i = e w i tn probability 1 
and thus H(Y 2 ™ \ Yft '= e) = 0, and if F 2 BP 4 = x G {0, 1} then 
y 2 BP = a; with probability 1 — 8 2 and YJi = e with probability 
8 2 . We have also defined rffi as above. By definition, E 2 f { = e 
if and only if Y 2 \ — e and so r]™ i = Pr[F 2 BP = e]. In (e) we 
have applied (I6U . 

Finally, combining d56]l, d57j, d60]l, (|62]i and (|63]), we obtain 
our desired (l26T l. 

□ 

A. Analysis of D 2 

Lemma 3: Let C and 5| p be defined as in Lemma [TJ Let 
£>2 P be the realized erasure rate in an application of BP at the 
relay (see Definition [2]). Then the following holds for large 
enough n (the probability being over the random selection of 
C), with probability at least 1 - exp(-/3/2 ■ n 1/3 ), 



Pr 



/):>' - ^; p | > n- 1/3 | The code C is used < 2e~ /3 / 2 -™ 1/3 

(64) 



where j3 > is some constant, dependent on A, p and t. 

Proof: By [47, Theorem 2], there exist constants /3,7 > 
0, which are dependent on A, p, t (where t is the number of 
BP iterations performed), such that for all e > and integer 
n > satisfying n > 27/e, 



Pr ID 



> e 



< 2c 



(65) 



Letting e = n again we obtain that for all n > (27) 3 ' 2 



Pr |D BP - 5f\ > n- 1 / 3 < 2e~ f) - n 



(66) 



The random space in ( I65t and ( 1661 ) is comprised of the random 
channel transitions as well as the random selection of the code 
from the (A, p) ensemble. For a fixed code C let P(C) denote 
the left hand side of d66i l conditioned on the use of C. That is, 



P(C) = Pr 



I Do 



> n 



^Z 3 I The code C is used 



The random space in P(C) consists of the channel transitions 
only. P(C) itself is a random variable which depends on the 
randomly selected C. We now use Markov's inequality to 
bound the probability that P(C) is very large. 

i/3i w E [P(C)} 



Pr 



P{C) > 2c 



-/3/2-n 



(t) 



(o) 

< 



2e -/3/2-ni/3 



Pr 



ID? 



> n 



-1/3 



2g- / 3/2-n 1 /3 
-,3/2-n 1 / 3 



(a) follows by Markov's inequality. The expectation on the 
right hand side is over the random selection of the code C. 

(b) follows by the law of total probability and the definition 
of P(C) and (c) follows by d66l ). The result now follows. □ 

Corollary 1: Let C, 5 2 F and D| p be defined as in Lemma [3] 
and let EIDjf] denote the expected value of D!f, for a fixed 
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value of C. Then the following holds for large enough n with 
probability at least 1 — exp(— /3/2 ■ n 1 / 3 ), 



Pr 



E[D B 2 



where /3 > is defined as in Lemma [3] 

The corollary follows immediately from Lemma [3] by the 
observation that is confined to [0, 1]. 

Appendix V 
Proof of TheoremO 

Our proof is based on the proof of Lemma [T| (Appendix |IV| 
above). As noted in Sec. IIII-FI our bound improves upon 
the bound of Lemma Q] by exploiting dependencies between 
the components of Y 2 P , the output of BP at the relay with 
soft-DF-BP. Specifically, in Appendix IV-AI below we will 
exploit dependencies between bits at discovered by BP (see 
Sec. [TTLB to tighten the bound d6§ on H(Yf | E 2 P , Y 3 ). In 
Appendix IV-BI we exploit dependencies between erasures at 
the output of BP, to produce a bound on iJ(E 2 p ) which is 
sometimes tighter than Q62Y 

The proof of the theorem will be obtained by applying the 
bounds in Appendices IV-AI and IV-BI in the following way. 
1/n • I(YJ; Yf | Y 3 ) is upper bounded by I+(S 2 ) + o(l) 
by (O, (|57]i, ((62]i and (|63]l from the proof of Lemma [T] 
and (|69) (Appendix E£] below). 1/n ■ I(Yf;Yf | Y 3 ) is 
upper bounded by I 2 (8 2 ) + o(l) by a similar set equations, 
replacing (l62l with (T78b (Appendix IV-BI below). Finally, the 
l.d.f. operation in (|28T > is justified in Appendix IV-DI 

A. Upper Bound on H{Y"{ | Ef , Y 3 ) 

As noted above, our bound in this section applies depen- 
dencies that exist between bits that were discovered by BP. 
An outline of the main idea behind the proof is provided in 
Sec. ITTLFl 

We begin with the string of equations ending with ( f&Tb on 
the following page. In (a) E2 is defined as E| p was defined 
in (l58l l. replacing Y 2 P with Y2. In this equation, we have relied 
on the Markov chain relation between the random variables, 
Y 3 «-> Yf <-> E 2 P f> E 2 . In (b) we have applied the definition 
of conditional entropy: The expectation is over the variable £ 2 , 
which is defined to be distributed identically as E2. In (c) we 
have applied the chain rule for entropy. We assume, without 
loss of generality, that the components of Y 2 P are ordered 
in the order that they would have been discovered by BP in 
its equivalent simplified formulation, Algorithm l4l (Sec. IIII-Fb 
had the channel erasures corresponded to £ 2 . That is, the 
first components are the ones revealed at iteration of the 
algorithm (i.e., not erased by the channel), they are followed 
by the components that the algorithm revealed at iteration 
1, and so forth. Components that were not revealed at any 
iteration are ordered last. In (d) we have separated the sums 
of components that were revealed by the channel, components 
that were revealed at iterations 1 and above of Simplified BP, 
and components not revealed by Simplified BP. V 2 denotes the 
erasure rate of £ 2 (Definition 01, and 2? 2 P denotes the erasure 
rate of Y 2 P at the output of BP, and is distributed as Df (see 
Appendix II V- Al l. 



We now examine the three sums on the right hand side 
of d(57l i. The desired exploitation of the dependencies between 
the bits discovered by BP will take place in the second sum, 
which we will examine last. The third sum is easily evaluated 
to equal zero. This is because j — e with probability 1 for 
all components of the sum, which follows from (f20b because 
Y 2 B \ = e at these components. Turning to the components of 
the first sum, let i £ {1, (1 — T>2)n\. 

H(Y% I Y 2 % t%, E 2 = £2, Y 3 ) < 

<H(Y™\E 2 *.,Y 3ti ,E 2 = £ 2 ) 
= H(Y% I Ef ti = e, y 3 ,i) Pr[£& = e | E 2 = £ 2 ] + 

+H(Y 2 ™ I Eft = 0, Y 3ii ) Pr[£- = | E 2 = £ 2 ] 



^ H(X 1A l y 3ii ) • (1 - 8 2 
<8 3 (l-8 2 ) 



(68) 



In (a), we have reduced the conditions on the entropy to 
obtain an upper bound on its values. In (b) we have applied 
-ff (Y 2 BP I E 2 w i = e, Y 3i i) = 0, which holds because conditioned 
on = e we have that y 2 BP = e with probability 1. We 
have also relied on the fact that if E^ = then Kf \ = Xi t i, 
where Xij is the transmitted signal from the source at time i. 
Finally, we are currently examining i 6 {1, (1 — T> 2 )n}, for 
which £ 2< i = by definition. If E 2 ,i = £2,1 — we clearly 
also have E^ = 0. By (|59j» we have Pr[^ p 4 = | E 2 = 
£ 2 ] = Pr[E 2< i — 0] = (1 — 8 2 )- In (c), we have relied 
on the fact that H(Xx ti \ Y 3a = e) = H(Xi a ) < 1 and 

H(x lti I r 3 ,, = 0) = H(x lti 1 r 3il = 1) = 0. 

We now turn to the components of the second sum in d67l i. 

Let i e {(1 - V 2 )n + 1, (1 - Vf)n}. 



H(Y£* I f 2 BP E 2 = £ 2 , Y 3 ) < 



< H(Y% I Y 2 w n 



Y B 
...in 



£2) 



= H(Xi ti I Y a 2i ,Yz, i ,'E 2 = £ 2 ) ■ (1 - 82) 

< S 3 (l - (1 - 82Y- 1 ) (1 - 82) 

The analysis follows in the line of the derivation leading 
to d68l ), and we will elaborate only on the differences. Recall 
that each component at indices i € {(1 — T> 2 )n+ 1, ...,{1 — 
I? 2 p )n} was discovered in the application of Simplified BP. 
Let ji,...,jd-i be the indices of the other variable nodes 
that were connected to a check node by which index i was 
discovered. By nature of the Simplified BP algorithm, these 
indices necessarily correspond to bits that were discovered 
at previous iterations of Simplified BP. Thus, since we have 
assumed that the indices i = 1, n are arranged by the order 
in which components of Y 2 P were discovered by Simplified 
BP, we have {ji, C {1, — 1}. (a) now follows. 

In (b), we have defined F 2 ^ = Y 2 BF n + ... + Y% jd _^ 
where addition is modulo-2. In (c), we have relied on the 

BP *■ *■ 

fact that Y 2 j is erasure if any of Y 2 BP j 1 , Y 2 w ^_ i is erasure. 
Conditioned on E 2 = £ 2 , each of these variables is erasure 
with probability 8 2 (this follows as in our above analysis 
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H(Y? | Y 3 ) = H<yf | E BP , E 2 , Y 3 ) 



^ Ec 

£-2 



E< 



ff(Y BP |E B1 \E 2 = 5 2 ,Y 3 

n 

E ff (*£< I Ki, ^-i, E 2 P , E 2 = f 2 , Y 3 ) 



(1-X> 2 )n 

E I f 2 B i, ^-i, E 2 P , E 2 = € 2 , Y 3 )+ 

+ E ff I .... E B /, E 2 = £ 2 , Y 3 ) - 

i=(l-X> 2 )n+l 
n 

+ E I E 2 = £ 2 , Y 3 ) 

i=(l-X>!f)n+l 



(67) 



of Pr[£^ = | E 2 = £ 2 ]) and thus PrfF^ = e] = 
l_(l_^ 2 )d-i. 

We now return to d67l >. Relying on the above discussion, 
we now have the string of equations ending with (l69l on the 
following page. In (a), 2? 2 is the erasure rate of the channel 
output Y 2 at the relay. Its expected value is clearly <5 2 . We have 
also relied on Corollary Q] (Appendix IIV-AI above) to express 
the expected values of Djf, recalling that it is identically 
distributed as D\*. 

This bound ^ concludes our analysis of H (Yf |Ef , Y 3 ). 



B. Upper Bound on ff(Ef ) 

As noted above, our bound in this section applies depen- 
dencies that exist between erasures at the output of BP. An 
outline of the main idea behind the proof is again provided in 
Sec. IIII-FI We begin as follows. 



ff(E BP ) < ff(E BP ,E B 



ff(E BP ) + H (E BP | E BP ) 



(70) 



where Ejf is as defined in Appendix IIV1 following d59l ). 
Focusing on the second term on the right hand side of ( ITOl i 
we obtain, 

n 

tf(E 2 "|E B /) < E^(^l^) 

i=l 
n 

( => E^ 2 )(i-o 



n[(l~5l P )-h(8 2 ) + o(l) 



In (a), we have observed that if E 2 ,p i = e, then by 
e with probability 1 and thus ff(-^2i 



(71) 



= 0. If 

>,i = 0) = 
= e] as in 

Appendix [TV] Finally, (b) follows in the same lines as in our 
derivation of (l63l in Appendix IIVI 



£F 4 = then Ef- 



E 2 and thus H(E* p a \ E, 
11(82)- We have also defined r) 2 w i to equal Pr[£J| 



We now turn to the first term in (T70b . 



#(E B 2 P ) 



< 



(a) 
< 



H(Ef,D?) 
# (E B 2 P I Df) 



H(Df) 
log(n + 1) 



(72) 



where (a) we have defined DJf to equal the erasure rate of 
EJf (see Definition |2). In (b) we have relied on the fact that 
Dif is confined to the set {0, 1/n, 2/n, 1}, which contains 
rt + 1 elements. 

□ The vector E| p specifies the bits that remained undecoded 
at the output of BP. Di et al. [14, Lemma 1.1] proved that 
these bits correspond to a stopping set of the code C (see [14] 
for its definition). For s £ (0, n), let Ne(s) denote the number 
of stopping sets of size s in C. We thus have, 



ff(E B / I Df = a) < log N e (an) 

Plugging this into ( f72b we obtain the string of equations ending 
with ( T73b on the following page. Note that the expectations 
in these equations are over D\*. We begin by examining the 
second additive term in ( T73b . 



E 



log Nc(Dfn) 
x Pr 



\D% 



|L> BP - (5? 



-1/3 

,1/3 



X 



2 u 2 

2 P | > 

< log(2") • 2e~' 9 / 2 -" 1/3 = o(l) 



(74) 



where we have relied on the fact that the number NciD^n) 
of stopping sets of size Ugn is trivially less than 2™, which 
is the number of subsets of of the indices {1, n}. We have 
also applied Lemma [31 (Appendix HV-At . 

We now turn to the first additive term in d73T l. Burshtein and 
Miller [7, Theorem 9] and Orlitsky et al. [43, Theorem 5] 
examined E[iVc(cm)] where the expectation is over all codes 
C in the (A, d) LDPC ensemble. From their development we 
have, for all a — k/n, k = 0, ...,n, 



logE 



N e {an) <f(a) + o(l) 



(75) 
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ff(Y?|E?,Y 3 ) < E £ n V 2 -8 3 (l-8 2 )+n(V 2 -Vf)8 3 [l-(l-8 2 ) 



(1 



n [S 2 ■ 8 3 (1 - 8 2 ) + (8 2 - S B 2 e )8 3 (1 - (1 - 5 2 ) d - 1 J (1 - 8 2 ) + o(l) 
n • <5 3 (1 - 4) [(1 - 8 2 ) + (<5 2 - 6?) ( 1 - (1 - fc)*- 1 ) + o(l)' 



(69) 



iJ(E B 2 p ) < E log N c (D™n) 



= E 
E 



log N c (Dfn) 
log N e (Dfn) 



+ log(n + 1) 
\Df -8f\< nr 1 / 3 
\D? - 8f\ > n-V3' 



Pr 
Pr 



|£>i p - 8f\ > n" 1 / 3 



log(n + 1) 



(73) 



where /(a) is given by ( f32t and the term o(l) is independent 
of a. A few minor remarks are deferred to Appendix IV-CI 
below. 

In d75l >. the expectation is over all codes C in our ensemble. 
In our analysis, however, we are interested in the probability 
that an individual code C has Nc(an) that greatly exceeds 
/(a). As in the proof of Lemma [31 (Appendix II V- At , we apply 
Markov's inequality to bound this probability. For fixed n we 
let f(a; n) denote the left hand side of (l75l ). We now derive, 



Pr 



N e (an) > e ™(f(<*;n)+n-^) 

e nf(a;n) 



< 



E[N c {an)} 

i(/(a;n)+n- 1 /2) 



where the probability is over the random selection of a code 
C from the (A, d) ensemble. By a union bound we obtain, 



Pr 



3aG{0,l/n,2/n,...,l} : N c (an) > e n( -^ a ' n '> +n U ^ 



< (n + l)-e"" 



1/2 



By these results, for large enough n, with probability at least 
1 — cxp(— rt 1,/3 ) (as required by Theorem HJs conditions), a 
randomly selected code C satisfies, 



C. Some Remarks Regarding Equation M5\ 

Our expression (f32t for /(a) is a slight variation of [43, 
Theorem 5] (7(a) in their notation). In [43], expressions for 
the minimizers x and y of the various minimizations (denoted 
xq and j/o) are provided, and the expression for f(a) is 
provided as a function of them. The range of the maximization 
of /3 is also different from the one we used in (l32t . An 
examination of their proof shows that these differences do not 
affect the final outcome. 

We now discuss d75l > (most importantly, with the o(l) term 
being independent of a). To justify its validity, we argue 
that in [43, Theorem 5], adding the term 1/nlogrt to the 
right hand side of the equation, produces an upper bound on 
1/n logE[iVc(cm)] for all n. To see this, observe that in [43, 
Lemmas 3 and 4] each limit may be replaced by a supremum 
over all n. This holds by replacing the asymptotic saddle-point 
analysis in the lemmas' proofs with an upper bound as [7, 
Equation (6)]. In [43, Equation (11)], where these lemmas 
were applied, we may discard the limit, replace the sum by a 
supremum, and add a compensation term 1/n log n, to obtain 
a bound on 1/n log E[iVc (cm)] rather than an evaluation of 
its limit. The desired result will then follow as in the proof 
of [43, Theorem 5]. 



- log N c (an) < f(a) + o(l) Va e {0, 1/n, 2/rc, 1} (76) 
n 

In the remainder of the proof below, will assume that our C 
satisfies j76t . 

Combining Q, d74l i and d76l l we obtain, 



tf(E B /) < n{E[f(Df) 
= n\f(Sf) + o(l) 



\D% - 8?\ < n 



-1/3 



(77) 



where we have invoked the continuity of f(a) which holds 
by [43, Corollary 6]. Finally, combining d70t . (fTTT ) and (F77l 
we obtain our desired bound on ii^Eif). 



H(tif ) < n f(S\ 



8?) ■ h{8 2 ) + o(l) 



(78) 
□ 



D. Justification of the l.d.f. operator 



The operator l.d.f. in ( 1281 ) is easily justified by the fact that 
I{Y\^\Y^ j Y3) must be descending as a function of 8 2 . To 



see this, let 8 2 < 8 2 and let Yf 



and Y? p be random vectors 



whose components are defined based on d20b . 



Y, 



2d 



= Y, 



2.i 



Yo 



= Y 



2,i 



E" 



where the components E' 2 i and E 2 i are independent and 
distributed as Erasure^) and Erasure^' ), respectively. Then 
Yif is stochastically degraded with respect to Yjf and thus 



7(Yf;Yf |Y 3 



< /(Y5 P ;Y|' 



monotonicity of Z(Y^ P ; Y^ p 



Y3), implying the desired 
Y3) as a function of 8 2 . 
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Appendix VI 
Proofs for Sec. IIII-GI 

A. Proof of Theorem \5\ 

We begin by introducing the following notation, for an 
arbitrary code C and 8 G [0, 1], 

7(C; £)=I(X; Y) 

where X is uniformly distributed within the codewords of C 
and Y is randomly related to X via the transition probabilities 
of a BEC(<5). By Fano's inequality (e.g. [13, Sec. 8.9]) and 
by virtue of the "goodness' of the sequence C n , the following 
must hold at 5 = 6*: 



1 



lim -I(C n ; 6*)=R=l-5* 



(79) 



The following lemma relates I(C n ; 5) to our desired 
■Pmap(Ctm S), an d parallels [23, Expression (1)]. The lemma 
extends results from [44]. 

Lemma 4: The following holds for any linear code C and 

Se[0,l], 



~ ) • /Wit': rt) 



(80) 



Proof: We begin with the following identity, which 
follows from [44, Expression (8)] (similar expressions are 
available in [39, Theorem 1] and [2, Theorem 1]) 



j=i 



din Pl i]Xi (Yi\Xi 
86 



logi* | Y (X,|Y) 



(81) 



The expectation is over both X and Y. Py, x .(y\x) and 
[x\y) denote the conditional probability functions cor- 



A'ilY 



responding to X and Y, where the superscript 5 denotes the 
BEC erasure probability. 
Rewriting (l8TT l we obtain, 



dS 



d\nP^ Xt {Y t \Xi) 
85 



xlogP* i|Y pQ|Y)]} 
(82) 

where the first expectation is over Y and the second over Xi. 

Let Xi(y) denote the MAP decoder output corresponding 
to a channel output vector y and index i. As mentioned 
in Sec. IIII-GI this output is obtained by mapping of the a 
posteriori probability Py., Y (l|y) to the set {0,1, e}, 



My) 



X, 

e, 



P s 

Xi Y 
|Y 



(l|y)=ze{0,l}; 



^i Y (i|y) = 1/2. 



where we have relied on the fact that since C is linear, 
P|.| Y (l|y) is guaranteed to be in the set {0,1,1/2} [48, 
Sec. 3.2.1]. 

If Xi(y) = x E {0,1}, the transmitted X i7 condi- 
tioned on Y = y, equals x with probability 1, and thus 



logPj- , Y (Xi|y) = with probability 1. If X t (y) = e we 
have P^- i Y (a;i|y) = 1/2 for Xj, E {0,1}. Furthermore, y 
must clearly satisfy = e (or else Xj(y) = e cannot hold) 
and thus PyAx iVi I x i) = ^ f° r x i e {0, 1}. We can now 



rewrite 



as, 



d5 




E 



x, 



d\n5 
06 



log(l/2) 




.EY(|{t:^i(Y)=e}|) 

The desired (T80b now follows by the definition of Pmap(C; 5). 

□ 

We now prove d34l l for 5 > 5*. Let e > and assume that 
Pmap(C; 5 ) < 5 — e for some 5 > 5* + e. By ( f80b we have, 



-J(C; i* 



r 1 1 

- ■ P M ap(C; <J) d6 



?(C; 5) ^ 



(b) 
< 



[5*,<5 -e]U[i ,l] 



(1-5*) 



e- (5 - e)ln 



5 - e 



= (l-5*)-/i(5 ,e) 

In (a), we have used the fact that I(C; 1) = which can 
be verified straightforwardly. In (b) we have relied on the fact 
that the MAP decoder outputs no more erasures than it obtains 
via the channel output, and thus Pmap(C; S) < 5. Also, to 
obtain that P M ap(C; 8) < 5 — e for 5 G [S a — e,5 ] we 
have relied on our assumption Pmap(C; 5 ) < 5 — e and on 
the fact that Pmap(C; 5) is non-descending as a function of 5. 
This holds because if 61 < 5 2 , then BEC^) is stochastically 
degraded with respect to BEC(<y. In (c) we have simply 
defined h(5 ,e) to equal the content of the brackets in the 
preceding equality. 

h(5 , e) clearly approaches zero as e — > 0. It is also strictly 
positive for all e > 0. This follows from \n(S /(5 — e)) < 
e/(5 a — e) which holds by the well-known inequality ln(l + 
x) < x for all x ^ 0, x > —1. 

The desired result (l34l at 5 > 5* now follows using the 
following argument: Let e > be small enough such that 6* < 
5 — e. Then for large enough k, we must have Pmap(C„; S ) > 
5 Q - e or else 1/n • I(C n ; 6*) < (1 - 6*) - h(6 a , e) < 1 - 6*, 
thus violating ( |79l . 

We now turn to prove d34T > in the range S < 6*. The proof 
is obtained straightforwardly by examining the output of ML 
decoding. A ML decoder can be perceived as a suboptimal 
bitwise estimator, which is not allowed to output erasures. 
The bit error rate (normalized by n) at the output of ML 
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decoding cannot exceed the word error rate (denoted P e (C n ; 8) 
in Definition |4), because the worst-case number of bit errors 
in a decoded codeword cannot exceed n. By the "goodness' of 
{C„}, P e (C„; 0) must approach zero for 8 < 8*. The bit error 
rate with optimal estimation equals half the erasure rate at the 
output of bitwise MAP estimation as defined in Sec. lHI-GK the 
optimal bitwise estimator makes a uniform random in {0, 1} 
whenever the MAP estimator of Sec. lHI-Gl outputs an erasure). 
This bit error cannot exceed P e {C n ; 8), and thus must approach 
zero as well. 



□ 



B. Proof of Theorem \6\ 

We now focus on fixed n. For simplicity of notation, we 
drop the index n in C n . We begin by examining the right hand 
side of (1361 1, Applying the chain rule for mutual information, 

7(Y 2 ;Y 2 | Y 3 ) =/(Y 2 ;Y 2 ,Y 3 ) -7(Y 2 ;Y 3 ) (83) 

We now examine the first term on the right hand side of I 

7(Y 2 ; Y 2 , Y 3 ) = H(%) - 77(Y 2 | Y 2 , Y 3 ) 



ff(Y 2 |Xi)+I(Xi;Y 2 )| - 77 (Y 2 | Y 2 ) 

(84) 

where Xi is the transmitted codeword, and is uniformly 
distributed in C. The term 7(Xi; Y 2 ) in ( 1841 ) will cancel out 
later. The other two terms can be evaluated, 



H(Y 2 | Xi) 



(*>) 



i=i 
n 



Xi, Y21, Y2 



'2,i I -X'l.i) 
i=l 

n ■ h(S 2 o 82) 



where (a) follows by the chain rule for entropy, (b) follows 
by the Markov relation Y 2 ,i O Xi ti O (Xi, Y 2) i, Y 2l ,_i). 
Finally, (c) follows by ( fT2l i. ( TTST l and (|5). Similarly, we obtain: 



77(Y 2 |Y 2 ) = n ■ (1 - 8 2 )h(8 2 ) 



(86) 



We now proceed to the second term on on the right hand side 
of d83j. 



7(Y 2 ; Y 3 ) ( => 7(Y 3 ; Xx, Y 2 ) - 7(Y 3 ; X! | Y 2 ) 



( ^7(Y 3 ;X 1 )-7(Y 3 ;X 1 |Y 2 ) 
7(Y 3 ;Xi)- [7(Y 3) Y 2 ;Xi) 



(c) 



J(Y 2 ;Xi) 

(87) 



(a) and (c) follow by the chain rule for mutual information, 
and (b) follows by the Markov relation Y 3 ff Xi f> Y 2 . We 
now argue that the following holds: 



J(Y 3 ;Xi) - n •[(!-*,) +o(l)] 



(88) 



To show this, we invoke the analysis of Appendix IVI-AI With 
the notation of that appendix, 7(Y 3 ;Xi) = n ■ 7(C; 83), By 



Lemma |U the following holds. 



-m s 3 ) 

n 



^ -I(C- 8*) + 

n 



~ ) lliwlC: 8), 18 



> 



(l-0 + o(l)J +(S 3 -5*) 
(l-<5 3 )+o(l) (89) 



In (a), 8* = 1 — R is the Shannon limit for rate 7? 
(Definition [TJ. We have relied on (l35l l and the fact that 
8 2 o 82 < 1 to deduce <5 3 > 8*. In (b), we have invoked 
the "goodness" of the sequence {Cn}^^ and (|79l to obtain 
1/n • 7(C; 8*) = (1 - 8*) + o(l). We have also relied on 
Pmap(C',8) < 8 as explained in Appendix IVI-AI Finally ( |88l 
is obtained from d89l by the observation that the capacity of 
a BEC(6 3 ) is 1 - 8 3 and thus 1/n • 7(C; 63) < 1 - 83. 
Turning to 7(Y 3 , Y 2 ;Xi), we begin by arguing that 



7(Y 3 ,Y 2 ;X 1 ) =7(Z;X 1 ; 



(90) 



where Z = Y 3 • Y 2 (multiplication in Y 3 • Y 2 is defined as in 
Sec. HE). 7(Y 3 , Y 2 ; Xi) > 7(Z; Xi) holds straightforwardly 
by the data processing inequality. To see why the reverse 
inequality holds, we define vectors Y 2 and Y 3 that are 
stochastic functions of Z, such that the joint distribution of 
the pair (Y 2 , Y 3 ) and Xi is identical to that of (Y 2 , Y 3 ) and 
Xi. The inequality will then again be obtained by the data 
processing inequality. 

Recall that by (fT2l i. the components of Y 3 are related to 
those of Xi via the memoryless BEC(<5 3 ). By ( TT2| > and ( fT6b , 
the components of Y 2 are related to those of Xi via the 
memoryless BEC(<5 2 o <5 2 ). We define the components of 
(Y 2 , Y 3 ) in the following way: 



< 85 > (Xii,Y^) 




Zi = e; 

Zi 7^ e, with probability e\\ 
Zi 7^ e, with probability e 2 ; 
Zi 7^ e, with probability e 3 . 



where, 



ei = 



(02 O &) • (1 - $3) 



(2 



(1 - 2 o 2 ) • 3 



1 - (0 2 o 2 ) '0 3 1 — (0 2 O 2 ) • 3 

(1 - 2 o 2 ) • (1 - 3 ) 



£3 



1 - (0 2 o 2 ) • 3 



Simple arithmetic now reveals that (Y 2 i3 F 3 i ), conditioned on 
the value of X\ t i, are distributed identically as (Y2,i)^3,i)- 

The channel from Xi to Z is a memoryless BEC with 
crossover probability (02 o 2 ) • 3 . By arguments similar to 
those used to prove d86l l we obtain, 



/(Z;Xx 



(l - (0 2 o 2 ) • 3 ) +o(l) 



(91) 



Combining f3), ®, (E), 
we obtain our desired (f36b . 



(E3 (IS, © and (HD, 



□ 
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C. Proof that Rub < Rcf 

We now argue that i?uB in d38) is upper bounded by Rcf 
in ( fT9l ). Let R be contained in the set on the right hand side 
of d3~8t . and let 82 be an accompanying value as specified there. 

First assume R < 1 — 83. A simple examination of the 
content of the braces in ( fT9l reveals that it is at least 1 — ^3 
for all 82, and thus Rcf > 1 — S3 > R. 

Now assume R> 1—53. We first show that we may assume, 
without loss of generality, that the condition R < 1— (62082) • 
83 in (|38) is satisfied by equality. If this does not hold, then 
we can increase 82 until an equality is reached. Increasing 82 
can only reduce the left hand side of (137) (see Appendix I V-DI 
above). Thus, increasing 82 does not violate ( 137) . Finally, by 
Theorem [6] condition ( |37) implies condition ( TT8) (this can be 
seen by taking n to infinity on the right hand side of d36)), 
and thus R = I — (82 ° 8 2 ) ■ 83 < Rcf- 

□ 

Appendix VII 
Proof of Theorem[7] 

In our analysis, we focus on the first source-destination 
pair. An outline of the proof was provided in Sec. IIV-CI 
We let X* and X| denote scalar random variables that are 
distributed as in the discussion following (|4Qb and ( |4"T) . That 
is, both are uniformly distributed in {±1}. We also let Y£ 
be a random variable that is related to them via the channel 
transition equation ((39). 

We distinguish between two cases, R < | -X"*) 

and R > I(X2*;Y* \X*). In the first case, which is discussed 
in Appendix I VII- Al below, we prove that R < Rmvd- In the 
second case, which is discussed in Appendix I VII-BI we prove 
R < i?suD- The desired d42) thus follows. 

A. Analysis in the Range R < Y* \ X{) 

Our proof begins in lines similar to the proof of the 
converse of the capacity of the multiple-access channel, [13, 
Sec. 14.3.4]. 

(#l,n + R2.n + O(l)) 

= H(W 1 ,W 2 ) + n-o(l) 
= I(Wi,W 2 ;Y 1 ) + H(W 1 ,W 2 \Y 1 ) + n-o(l) 
= I(W 1 ,W 2 ;Y 1 ) +H(Wi I Yi) + 
+H(W 2 I Wi,Y x )+n-o(l) 
= J(Wi, W 2 ; Yi) + n ■ o(l) + n ■ o(l) + n ■ o(l) 



n ■ 2R = 



< 



^I{X li ,X2 i ;Y li ) + n-o{l) 

i=l 



< nI{XlX* 2 -Y*)+n-o(l) 



(92) 



In (a), Ri^n and i?2,n are the rates of the codes Ci „ and C 2 , n , 
respectively, and the equality holds by the definition of R as 
the rate of the code sequences {Ci n }^ =1 an d {^2,n}^=i- In 
(b), W\ and W 2 are the messages that were transmitted by 
Sources 1 and 2, respectively, defined as [13, Sec. 14.3.4]. 



The equality holds because the two messages are statisti- 
cally independent, and uniformly distributed in {1, .., 2 nRl -' 1 } 
and {l,..,2" fl2 "}, respectively. In (c), H(Wi | Y x ) = n-o(l) 
holds by Fano's inequality [13, Theorem 2.11.1], relying on 
the fact that the probability of error in the decoding of W\, 
by the conditions of our Theorem |7] approaches zero with 
n. The justification for H(W2 \ Wi,Yi) = n ■ o(l) will be 
provided shortly, (d) follows by the same arguments as [13, 
Equation (14.116)]. Finally, (e) follows by our discussion in 
Appendix IVII-CI below. 

By (192) , recalling that we are now focusing our attention to 
the range R < I^X^Y* \ X*), we obtain by gO) (recalling 
our above definitions of X*,X 2 and Yf*), R < Rmvd- 

We now prove H(W2 | Wi,Yi) = n ■ o(l) in the above 
equation (c). Consider the scenario facing a decoder of W2 (at 
Destination 1), recalling the channel equation d39) . As noted in 
Sec. HV-Cl given W%, the decoder is able to eliminate X%, and 
is thus faced with a point-to-point BIAWGN communication 
scenario, with SNR = h 2 /a 2 . The capacity of this channel 
is clearly C(SNR) = I{X^;Y{ | X{). By the fact that 
R < /(Xj*; Yj* I XI) (we are currently focusing on such R), 
we have that SNR > SNR*, where SNR* is the Shannon 
limit for rate R. By the "goodness" of {C 2 , n }^Lu recalling 
Definition |U we obtain that the probability of error, under 
maximum-likelihood decoding, of W2 given Yi and W\, must 
approach zero with n. Thus, by Fano's inequality (as in our 
analysis of H{W 1 \Y 1 )), we obtain H(W 2 W u Y x ) = n-o(l) 
as desired. 



B. Analysis in the Range R > I(X£;Y* \ X*) 
Our analysis begins as in Appendix IVII-AI 

n ■ 2R = I(W 1 ,W 2 ;Y 1 ) + H{Wi \ Y X ) + 
+H(W 2 I W u Y 1 ) + n-o{l) 

<I(W 1 ,W 2 ;Y 1 )+n-o(l) + 

+n(R - i(x; ; y{ I X*) + o(l)) + n ■ o(l) 



(<=) \ - 

< > , I (Xii, X 2 ii Y\i 



nR-nI{Xl-Y* \ X*) 



+n ■ o(l) 



< nI{Xl,X*-Y*) +nR- nJ(X|; Y* \ X*) + n ■ o(l) 
= nI{X{; Y*) +nR + n- o(l) 



nRsuD + nR + n ■ o(l) 



(93) 



(a) follows as in Appendix|VlFA] In (b), H{W 1 \Y X ) = n-o(l) 
follows again as in Appendix IVII-AI and H(W2 \ Wi,Y{) < 
n{R - I{X*-Y* I X*) + o(l)) will be justified shortly, (c) 
and (d) follow as in Appendix IVII-AI (e) follows by the chain 
rule for mutual information [13, Theorem 2.5.2]. Finally, (f) 
follows by (ED . recalling our above definitions of X^ and Y*. 

Subtracting nR from both sides of the above inequality, 
dividing by n and taking n to infinity, we obtain R < i?suD 
as desired. 

We now prove H(W 2 \ Wi, Y x ) < n(R - I(X* ; Y* \Xf) + 
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o(l)), as required in inequality (b) above. 

H(W 2 \Wi,Y 1 )<H(X 2 \Wi,Yi)+n-o(l) 

= H{X 2 I Wi) - J(X 2 ; Yi I Wx) + n • o(l) 

(f>) / \ 

< n\R + o(l) J - /(X 2 ; Yi | Wi) + n ■ o(l) 
( = , ni 1 , -/(X 2 ;Y 2 )+n-o(l) 

^C(SNR) + o(l)) +n-o(l) 



< nR — n[ 



' nR - nI(X%; \ X{) + n ■ o(l) 



(94) 



(a) is proven in Appendix IVII-DI (b) follows by the fact that 
the cardinality of the range of the random vector X 2 cannot 
exceed that of W 2 , which is 2 nR2 > n , and R 2 , n = R + o(l). 
In (c), we have defined Y 2 = Y x - Xi(Wi), where Xi(Wi) 
is the codeword corresponding to W\. As noted in Sec. IIV-CI 
and Appendix I VII- A| by (139k the channel from X 2 to Y 2 is a 
BIAWGN with SNR = h 2 /a 2 . The capacity of this channel is 
clearly C*(SNR) = I(X^;Y{ | Xf). As we have confined our 
attention to R > 7(X|; Y?\X{), we have SNR < SNR* where 
SNR* is again the Shannon limit for rate R. The justification 
for (d) will be provided shortly. Finally, in (e) we have simply 
rewritten C(SNR) = I(X$;Y? Xf). 

To justify (d), we argue that 7(X 2 ;Y 2 ) > n(C*(SNR) + 
o(l)). Had the SNR satisfied SNR = SNR*, this would have 
held trivially by the "goodness" of code sequence {C 2 , n }$£Li 
and Fano's inequality. However, as mentioned above, we are 
now interested in SNR < SNR*. Our justification in this 
range of SNR follows by similar arguments to those leading 
to dHJ. Specifically, we let J(C; SNR) = 7(X; Y) where X is 
uniformly distributed within the code C and Y is related to it 
via the transitions of a BIAWGN, as With this definition, 
by our above discussion I(X 2 ;Y 2 ) = I(C 2<n , SNR). We let 
C(SNR*) denote the capacity of a BIAWGN with an SNR of 
SNR*. We now have, 



nC(SNR*) = J(C 2 ,„; SNR*) + n ■ o(l) 

/■SNR* ^ 

= I(C 2 ,„; SNR) + / -mmse(C 2 „; snr)dsnr + n ■ o(l) 
Jsnr 2 

CO /"SNR* j 

< I(C 2 . n ', SNR) + / —n ■ mmse(bitwise; snr)cfenr + 

JSNR 2 

+n ■ o(l) 



= I(C 2 , n ; SNR) + n ■ (C(SNR*) - C(SNR)J + n ■ o(l) 

(95) 

(a) follows by similar arguments to (|79i l, relying on Fano's 
inequality and the "goodness" of {C 2) „}^_ 1 . In (b), the mmse 
function is defined as d45b (Appendix|I|. The equality follows 
from the relation between mutual information and the MMSE, 
see [23, Equation (1)]. In (c), mmse(bitwise; snr) denotes the 
MMSE in the estimation of a symbol X which is uniformly 
distributed in {±1}, from Y which is related to X via a 
BIAWGN with noise variance 1/snr. In such estimation, the 
decoder does not have the benefit of the code structure to 
draw upon, and so the estimation error clearly increases in 
comparison to the estimation of a given bit in C 2 .„. In (d), 
we have relied on the fact that the derivative of the function 



C(SNR) with respect to SNR is l/2mmse(bitwise; snr). This 
follows from the discussion of [23, Sec. II. A0 Finally, 
recalling 7(X 2 ;Y 2 ) = 7(C 2 .„, SNR), we have our desired 
result. 



□ 



C. Analysis of I {X u ,X 2i ]Yii) 

We now prove inequality (e) in the string of equations lead- 
ing to d92l and inequality (d) in the string of equations leading 
to j93l . Our proof relies on the properties of "good" codes 
for the BIAWGN. Specifically, we show that the marginal 
distributions of the individual code symbols Xu and X 2 i, 
i = 1, n, cannot stray too far from the uniform distribution 
over {±1}, which is the capacity-achieving distribution over 
the BIAWGN [21, Theorem 4.5.1]. Our proof is a variation of 
a similar result by [54, Theorem 4]. 



I(Xii, X 2 i; Yu) 



i=l 



l 2 [Pli x p 2i 



(!>) 
< 



1 n 

■i 2 (-^2(pu X?2,)) 

i=l 

(p* x p* + o(l) 



n ■ i 2 [p xp 
n ■ i 2 \p* x p 
n-l(Xl,Xl-Y{ 



(96) 



■o(l) 

+ n ■ o(l) 



In (a), we have made the following definitions. i 2 (-) is a 
function whose argument is a probability function p(xi,x 2 ) 
where (x 1 ,x 2 ) £ {0,1} 2 . Its value is I(Xi,X 2 ;Yi) where 
(Xi, X 2 ) are distributed as p(xi, x 2 ) and Y\ is related to them 
via the transitions of the interference channel, ( f39b . pu is a 
probability function over x\ S {0, 1}, corresponding to the 
distribution of Xu. p 2 i is similarly defined, corresponding to 
X 2l . pu x p 2l is defined by, 

p = puxp 2i =3- p{x\,x 2 )=pii{xi)-p 2i {x 2 ) 

V(x u x 2 ) G {0,1} 2 

The independence between Xu and X 2 %, implied by equality 
(a), follows from the independence between the messages Wi 
and W 2 , as in [13, Equation (14.122)]. 

Inequality (b) follows by Jensen's inequality and the concav- 
ity of the mutual information as a function of the marginals of 
its distributions, [13, Theorem 2.7.4]. In (c), we have defined 
p* to be the distribution of X£ (and of Jf 2 ). The justification 
for this equality will be provided shortly, (d) follows by the 
continuity of i 2 {-), and (e) follows by its above definition. 

To prove equality (c) above, we consider communication 
over a point-to-point BIAWGN with an SNR equal to the 
Shannon limit for rate R, SNR*. We let i x (p) denote I(X; Y) 
where X takes the value 1 with probability p and —1 with 
probability 1 — p. Y is related to X via the transitions of the 

27 Specifically, [23, Equation (17)] corresponds to mmse(bitwise; snr) and 
[23, Equation (18)] corresponds to C(SNR). 
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above-mentioned BIAWGN, see (0. 

[ ^) < /(X 1 ;Y 1 ) + n-o(l) 



i=i 

n 

= h(nii) + n ■ o(l) 

i=l 
n 

= y^ii(7Tii) + n-o(l) 







i 


< 


n • ii 










(/) 






< 


n ■ ii 





+ n-o(l) 



In (a), Yi corresponds to the output of the above mentioned 
BIAWGN channel, when provided with Xi as its input. 
We have relied on the fact that as the capacity-achieving 
distribution for the BIAWGN corresponds to p = 1/2, the 
capacity of the BIAWGN is ii(l/2). The equality now follows 
by the same arguments as equality (a) in the string of equations 
ending with ((95), relying on the "goodness" of {Ci,n}^Li- 
(b) follows as [13, Equation (8.104)], relying on the memo- 
rylessness of the BIAWGN. In (c), we have defined nu to 
be the probability that X\; L is equal to 1. In (d), we have 
defined ttu = min(7ri,-, 1 — ttu) and the equality follows by 
the obvious symmetry of ii(-). In (e), we have again applied 
Jensen's inequality and [13, Theorem 2.7.4]. In (f), we have 
relied on the fact that the maximum of ii(-) is achieved at 
1/2, as this corresponds to the capacity-achieving distribution 
of the BIAWGN. 
We now have, 



lim ix \ - V^ii ) 

n->oo \ n ' — ' / 
\ i=l / 



The function ii(-) achieves its maximum uniquely at p = 1/2. 
We thus obtain, 



I ™ 1 
lim - V ttu = - 

n->oo n z — ' 2 
i— 1 



We now define, 



A 1 1 



2 n 



(97) 



(98) 



and, 



By a simple argument, relying on ((98) and the fact that ttu < 
1/2 for all i, we have, 



h(n) = U : ttu > i - V h( n ) 



where I\(n) c denotes the complement set of I\(n). Note that 
I\{n) satisfies, 

i e h (n) =>■ 



^-VM^<Pu(x)<- + VMn) Vxe{±l} 

where pu( ) is as defined above. We similarly define fi{ri) 
and l2{n). Equation ((96) now follows by the observation that 
/i(n) and /2(n) approach zero with n, and by the above 
definition of p*. 



D. Analysis of H(W 2 \ W^Y^ 

We now justify inequality (a) in the string of equations 
leading to ((94). 

H(W 2 I W u Yi) < H(W 2 ,X 2 I W 1 ,Y 1 ) 

= H(X 2 I Wx,Yx) + fr(W a I X 2 , Wi, Y x ) 



(a) 



fl"(X 2 I Wi, Yi) + iJ(W 2 I X 2 ) 



< i?(X 2 I Wi, Yi) + H(W 2 I Y) 
= iJ(X 2 I Wi,Yi) + n-o(l) 

(a) follows by the Markov chain relation W 2 <-> X 2 o 
(Wi,Yx). In (b), we have defined Y to be the output of a 
BIAWGN channel ©, whose SNR is equal to SNR*, which 
is provided with the input X 2 . The inequality follows by the 
data processing inequality, using the Markov chain relation 
W 2 O X 2 <-> Y. By the "goodness" of {C 2> n}^ = i, recalling 
Definition |U the probability of error, when decoding W 2 from 
Y, must approach zero with n. Equality (c) now follows, using 
Fano's inequality. 

Appendix VIII 
Results for SecFvI 

A. Optimization of Codes for Erasure Relay Channels 

We now elaborate our algorithm for the design of LDPC 
codes for erasure relays channels, as discussed in Sec. IV-AI 
The input to the algorithm is a pair (A, p). 5 2 is obtained from 
(A, p) by selecting the minimum value such that I + (6 2 ) as 
defined d28l is less than or equal to C . p is kept constant 
throughout the iterations of the algorithm, and A is iteratively 
improved. In our description below, we let A denote the left 
edge distribution at the beginning of an iteration, and A + 
the improved distribution obtained at the iteration's end. We 
also let &2 be obtained from (A + , p) in the same way as 8 2 
was obtained from (A, p). The algorithm seeks to maximize 
the design rate corresponding to (A + ,p) while requiring that 
A + be "close enough" to A (as explained below) so that 
(A + , p, 82) will likely still be admissible (see Sec. IV- Al l. The 
algorithm stops when a non-admissible triplet is obtained. 

We define "closeness" by, 



^A+.pf)(x 2 ,x 3 )-pW(x 2l x 3 ) 

i 

v 



< 



\h(n) c \ < VfW)-n 



Vx 2 ,x 3 £{0,e},£=l,...,t (99) 
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where pf and P^ l) are computed by an application 



of sim-DE corresponding to (A, p, S 2 ). Pjj ^ and P£> denote the same as the above two, but replacing s^' v , sf , s^ ,v , 5% 
rightbound message distributions (see Sec. IHI-El i. Each Pjj 



j(0 



are linear in A + . The third and fourth sets of inequalities are 



is a singleton distribution, which equals the contents of the 
brackets in d48l > and is computed as a byproduct of sim-DE. 
rj > is a design parameter (we experimented with r\ = 0.1). 
The constraints d99l ) are lineaiPl in Xf. We augment them 
with the requirement that the components of A + sum to 1 and 
be confined to the range [0,1]. By ([8]), maximization of the 
design rate is equivalent to maximizing ^\ A- Thus, the 
maximization problem is linear, and can be solved by a linear 
program. 

B. Optimization of Codes for Symmetric BIAWGN Interference 
Channels 

The optimization algorithm proceeds in lines similar to 
those of Appendix IVIII-AI Optimization begins with an ad- 
missible (A, p) pair. We define an admissible pair as such that 
with its use, the bit error rate, at the output of each destination 
decoder, with respect to the codeword transmitted by the 
corresponding source (but not the interfering codeword), and 
as computed by density evolution, is sufficiently low (typically 
we require a BER of at most 10 -5 ). Optimization proceeds by 
attempting to iteratively improve A, so that at each iteration, 
the design rate is increased, without violating the admissibility 
of (A, p). To achieve this, we limit our search range to A + that 
are "close" to A, as defined below, where A and A + are defined 
as in Appendix IVIII-AI 

We define "closeness" by four sets of inequalities. The first 
set is given by, 



t-i) 



rj > is a fixed constant, sf' 1 ' and sf' are computed 



W = l, 



,t 



based on an application of density evolution with (X,p). 
is obtained from the rightbound message distribution P| 
with the primary decoder (Definition [6]), at iteration £, by 



'I 

w 



<f>(P^), where, 

$(P) = E [log (1 + e" 



where the L is a random variable distributed as P. Each sV 



is similarly obtained from the singleton distribution P x 
defined as the intermediate distribution computed at iteration 
£ of density evolution, corresponding to rightbound messages 
computed at variable nodes of degree i at iteration £ of soft- 
IC-BP. 

The second set of inequalities resembles the first, with 



A+, 



and s± replaced by A H 



and 



A+ 



is 



as 0. 



and Si are defined as s\ 



obtained from A 
and s\ , except that they are based on the distributions 
of messages from variable to state nodes, as computed by 
density evolution. Note that by multiplying both sides of the 
inequalities by ^2 k {X^ /k), we again obtain inequalities that 

28 To see this, observe that any constraint |a| < b is equivalent to the two 
constraints a < b and a > —6. 



with Sj , )§2 >*2 > computed based on distributions 
corresponding to the interference decoder (Definition |6). 

We have also found it useful to further restrict A + by 
the following constraint, which is reminiscent of the stability 
condition [49]. 



XJ 



5>'-i)ft-<(W) 



E 



-L/2 



where rj' > is a small constant (we experimented with 

(t 2) 

T) = 0.02). L is a random variable, distributed as ' , 
the singleton distribution corresponding to the last iteration 
of the primary decoder. As in Appendix IVIII-AI we further 
augment these inequalities by requiring the components A + 
to sum to 1, and be confined to the range [0,1]. By seeking 
to maximize J^. Xf /i, we again obtain a linear maximization 
problem, which can be solved by linear programming. 

The above procedure is prone to attraction to local maxima 
as follows. In our application of the procedure, we typically 
select the initial admissible pair (A, p) to have very low design 
rate (even negative-valued). At such low rates, it is usually 
possible to achieve complete (rather than partial) decoding of 
both the primary and interference codewords. As the iterations 
of the optimization procedure progress, the design rate of the 
pair (A, p) increases. However, the procedure remains confined 
to pairs with which complete decoding of both codewords is 
possible. The maximum possible rate in such conditions is 
upper bounded by Pmud (see d40li). 

This problem is easily corrected by enforcing partial de- 
coding upon the interference decoder, even when complete 
decoding is possible. That is, we examine a variant of soft- 
IC-BP, where the interference decoder stops computing new 
messages (e.g. rightbound and leftbound) after the value s!> 
(defined above) has dropped below a predetermined threshold. 
Once the optimization procedure has progressed and the design 
rate of the pair (A, p) has sufficiently increased, it is possible 
to relax this enforcement. 



C. Details of the Application of HK in Sec. W-B\ 

Our discussion below assumes the results and notation 
of [25, Sec. III]. As noted in Sec. II-AI HK achieves its 
remarkable performance by constructing codes (here denoted 
X\ and X2) that are each obtained by combining two auxiliary 
codes, Ui and Wi, i = 1,2, with rates Si and Tj, respectively. 
Destination 1, for example, decodes the codewords Ui G U\ 
and wi G Wi, produced at its corresponding source, as well 
as W2 £ W2, amounting to a partial decoding of A2. 

The codes Ui and Wi are generated randomly, by in- 
dependent selection of the components of their codewords 
according to the distributions of random variables Ui and Wi, 
respectively. In our application of HK, we assigned Ui ~ 
Bernoulli(0.055) and W, ~ Bernoulli(l/2). We also defined 
Xi = BPSK(Ui © Wi), i = 1,2, where © denotes modulo- 
2 addition and the function BPSK maps the digits {0, 1} to 
{1,-1}. This means that the codewords of Xi are similarly 
obtained by applying the above operation componentwise to 
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pairs of codewords (iij, Wj) from Ui and W;. An evaluation of 
the rate implied by [25, Theorem 3.1] gives 0.333 bits. More 
precisely, this figure is obtained by maximizing S\+T\ = 
S 2 + T 2 (we restricted Si = S 2 and 7\ = T 2 ), as defined 
there, subject to [25, Equations (3.2)-(3.15)]. The maximizing 
choices were Si = 0.101 bits and Ti = 0.231 bits, respectively. 

Note that our use of binary rather than real- valued random 
variables as in applications of HK for the AWGN interference 
channel (e.g. [18]), as well as modulo-2 addition, follow from 
the channel's binary input alphabet. 

Consider sequences of codes {Xi >n }^_ lt i = 1,2 where 
Xi <n has block length n, constructed as described above. We 
now verify that each such code-sequence is point-to-point 
"bad" for the BIAWGN channel, in the sense of Definition [4] 
with a probability that approaches 1 with n (the probability 
being derived from the above-mentioned random generation of 
the codes). Note that this assertion also holds by Theorem [7] 
relying on the fact that the rate of the code sequences (0.333) 
exceeds both i?MUD and i?suD (see Sec. |V-Bt . For simplicity 
of notation, we drop the indices i and n in the sequel. 

To obtain our result, consider communication over a point- 
to-point BIAWGN channel using a code X generated as above. 
We wish to show that reliable communication requires an SNR 
that is greater than the Shannon limit for rate 0.333. Successful 
decoding in this setting recovers the codewords u G U and 
w G W which produced the transmitted x G X, as byproducts. 
Thus, the communication setting is equivalent to that of a 
multiple-access channel (see e.g. [13, Sec. 14.3]), where two 
users transmit the codewords u and w, respectively, and the 
receiver obtains, 

y = BPSK(u 8 w) + z (100) 

The operation BPSK is applied componentwise and z is zero- 
mean i.i.d AWGN noise whose components have variance 
a 2 = 1/SNR. Invoking [13, Equations (14.99),(14.111)], the 
following conditions are necessary for reliable communica- 
tions, 

S < -/(U;Y|W)+o(l) (101) 

n 

T < -/(W; Y I U) +o(l) (102) 
n 

S + T < -J(U,W;Y)+o(l) (103) 

n 

Where the random vectors U, W and Y correspond to the 
transmitted codewords from U and W and to the channel 
output, respectively. By methods similar to [63, Lemma 8], 
relying on the above-mentioned random construction of U and 
W, and invoking the symmetry of the BIAWGN channel, we 
obtain, with a probability that approaches 1 with n, 

-J(U;Y|W) < I(U;Y\W) + o(l) 
n 

where U and W are independently distributed, and Y is related 
to them in the same way as ( 1 1001 ). Thus, ( 1101b translates to, 

S < I(U;Y\ W)+o(l) (104) 



Similarly, ( [1021 and ( fl03l imply, 

T < I(W;Y\ U)+o(l) (105) 
S + T < I(U, W; Y) + o(l) (106) 

An evaluation of ( 11041 ). ( 1105b and ( 11061 ) reveals that the 
inequalities require an SNR of at least 0.7684 to be satisfied. 
This value, which is the minimum SNR required for reliable 
communications to be possible, exceeds the above-mentioned 
Shannon limit for rate 0.333 bits (Definition [TJ, which is 
SNR* = 0.5941. By Definition |H this produces our desired 
result. 
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